Simple Linux Utility for Resource Management (SLURM)

Last modified by Administrator on Wed, 03/25/2020, 2:34 PM

SLURM is the native scheduler software that runs on COARE’s HPC cluster. Users request for allocation of compute resources through SLURM. It arbitrates contention for resources by managing a queue of pending work.

SLURM Entities

SLURM entities are relevant terminologies used in SLURM, which include the following:

  • Nodes – Compute resource managed by SLURM.
  • Partitions – Logical set of nodes with the same queue parameters (job size limit, job time limit, users permitted to use it, etc.)
  • Jobs – Allocations of resources assigned to a user for a specified amount of time.
  • Jobs Step – Which are sets of (possibly parallel) tasks within a job.

Types of Jobs

The following are the types of jobs that users can run in the HPC:

  • Multi-node parallel jobs

    Multi-node parallel jobs use more than one node and require message passing interface (MPI) to communicate between nodes. Jobs usually require computing resource (cores) more than a single node can offer.

  • Single-node parallel jobs

    Single-node parallel jobs use only one node, but multiple cores on that node. These include pthreads, OpenMP, and shared memory MPI.

  • Truly-serial jobs

    Truly-serial jobs require only one core on one node.

SLURM Partitions

The COARE’s SLURM currently has two (2) partitions, batch and debug, which are detailed below:

  1.  batch
    1. - Suitable for jobs that take a long time to finish (<= 7 days)
    2. - Six (6) nodes may be allocated to any single job
    3. - Each job can allocate up to 4GB of memory per CPU core
    4. - Default partition when the partition directive is unfilled in a request.
  2.  debug
    1. - Queue for small/short jobs
    2. - Maximum run time limit per job is 180 minutes or 3 hours
    3. - Best for interactive usage (e.g. compiling, debugging).

Saliksik Cluster

As part of the efforts to upgrade the COARE's current infrastructure, the COARE Team has started to implement the saliksik cluster, which comprises the next generation of HPC-based CPUs and GPUs of the COARE.

The saliksik cluster is divided into four (4) partitions:

  1.  Debug - with a Max runtime of 3 hours
  2.  Batch - with a Max runtime of 3 days
  3.  Serial - with a Max runtime of 7 days
  4.  GPU - with a Max runtime of 3 days

SLURM Job Limit

SLURM Job Limits are imposed for fair usage of the COARE's resources. These prevent users from hogging the COARE’s resources. Every job submitted by users to SLURM is subject to the COARE’s SLURM Job Limits.

The COARE’s policy on SLURM Job Limits is as follows:

  • Users can request up to 168 hours (1 week, 7 days) for a single job.
  • Users can request up to 240 CPU cores (this can be just one job or allocated to multiple jobs).
  • Users can have a total of 30 simultaneous running jobs.

Job limits implemented for the COARE's saliksik cluster are summarized in the table below:

BatchOne (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) days only
GPUOne (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) days only
DebugOne (1) hour default; automatically extended by one (1) hour by the HPC job scheduler to a maximum of three (3) hours only
SerialOne (1) day default; automatically extended by one (1) day by the HPC job scheduler to a maximum of seven (7) days only

Job Script

A job script is a script that has the parameters needed to run the specific job of the user. Users should specify the requirements of the job before submitting it into the scheduler.

SCRIPTMEANING
#!/bin/bashAllows script to run as bash script
--partitionbatch/debug
--timeDD-HH:MM:SS
--nodesnumber of nodes
--ntasks-per-nodenumber of tasks per node
--memRAM
--job-namejob name
--outputSTDOUT
--mail-userSend Address
--mail-typeEND/FAIL/ALL
-w <compute-node>request a specific node/s
--gres=gpu:1specifies the number of GPU devices
For more info, visit this: SBATCH Documentation

Job Script Example

#!/bin/bash
 #SBATCH --partition=debug
 #SBATCH --time=00:10:00
 #SBATCH --nodes=1
 #SBATCH --ntasks-per-node=6
 #SBATCH --mem=24000
 #SBATCH --job-name="Job1"
 #SBATCH --output=job.out
 #SBATCH --mail-user=gridops@asti.dost.gov.ph

#SBATCH --mail-type=ALL
 #SBATCH –requeue

echo "SLURM_JOBID="$SLURM_JOBID
 echo "SLURM_JOB_NODELIST"=$SLURM_JOB_NODELIST

echo "SLURM_NNODES"=$SLURM_NNODES
 echo "SLURMTMPDIR="$SLURMTMPDIR
 echo "working directory = "$SLURM_SUBMIT_DIR

# Place commands to load environment modules here

module load <module>

# Set stack size to unlimited

ulimit -s unlimited

# MAIN

srun /path/to/binary

IMPORTANT NOTES:

  • It is important to set accurate resources and parameters. By doing this, you can effectively schedule jobs, prevent your program from crashing, and avoid wasting resources. Also, before you submit your job, you need to determine which partition you will submit it, batch or debug.
  • Running jobs in /home is not allowed.
  • Active files should be transferred in /scratch1 and/or /scratch2.
  • Both /scratch1 and/or /scratch2 should not be used as a long-term storage for your files. If you wish to store your files for a longer time, please use your /home directory.

SLURM Job Submission

  • sbatch* To submit job script to SLURM:

    sbatch <bash-script>
    • For more info:
      sbatch --help

Job Management

Users are able to manage their jobs by checking the status of nodes, submitting the job script to the queue, checking the job’s status, or cancelling a job.

The following commands will be helpful in managing your jobs:

  • To check the status of nodes in the cluster:
    sinfo
  • To submit the job script to the queue:
    sbatch <slurm-script.sh>
  • To check the status of the job:
    squeue -u <user>
    scontrol show job <jobid>
  • To cancel a job:
    scancel <jobid>

Transferring files to HPC

Users can transfer relevant files to the HPC. For more information, visit the wiki on Transferring Files.

Tags: