What is Slurm?
Slurm (Simple Linux Utility for Resource Management) is an open-source job scheduler and workload manager used on both the Innovator and Discovery HPC clusters at SDSU. Slurm performs several important functions:
- Allocates time and resources on worker nodes to perform a job
- Allows users to start, monitor, and manage jobs running on worker nodes
- Queues and balances job submissions fairly across all users on the cluster
All jobs on both Innovator and Discovery must be submitted through Slurm. You should never run computationally intensive tasks directly on the login node.
Partitions (Node Types)
A partition, also known as a queue, is a subset of the cluster nodes that share the same characteristics. Users can specify which partition to run a job on. If no partition is specified, the job will run on the default compute partition.
Innovator Partitions
| Partition |
Time Limit |
Memory per Node |
CPUs per Node |
Nodes |
Best For |
compute |
14 days |
256 GB |
48 |
46 |
General purpose jobs (default) |
bigmem |
14 days |
2 TB |
48 |
4 |
Memory intensive jobs |
gpu |
14 days |
512 GB |
48 |
14 |
GPU and machine learning jobs (2x NVIDIA A100 80GB per node) |
quickq |
12 hours |
256 GB |
48 |
46 |
Short jobs and testing |
In plain terms for Innovator:
- Use compute for most general research jobs
- Use bigmem if your job needs more than 256 GB of memory
- Use gpu if your job requires GPU acceleration such as deep learning or machine learning
- Use quickq for short test runs under 12 hours — jobs start faster here
Discovery Partitions
| Partition |
Time Limit |
Memory per Node |
CPUs per Node |
Nodes |
Best For |
compute |
14 days |
256 GB |
48 |
10 (includes 2 big memory) |
General purpose jobs (default) |
gpu |
14 days |
512 GB |
48 |
5 |
GPU jobs (2x GPU per node) |
all-gpu |
14 days |
512 GB — 1 TB |
48 |
7 |
All GPU nodes including large GPU nodes (lg001, lg002) with 4x GPUs and 1 TB RAM |
In plain terms for Discovery:
- Use compute for general research jobs
- Use gpu for standard GPU jobs — each node has 2 GPUs and 512 GB RAM
- Use all-gpu for jobs needing maximum GPU resources — includes large GPU nodes with 4 GPUs and 1 TB RAM each
Viewing Partition and Job Status
These commands work the same on both Innovator and Discovery.
To view the current state of all partitions and nodes:
[john.doe@jacks.local@cllogin002 ~]$ sinfo
To view only your own jobs:
[john.doe@jacks.local@cllogin002 ~]$ squeue -u $USER
To monitor your jobs and refresh every 30 seconds:
[john.doe@jacks.local@cllogin002 ~]$ watch -n 30 squeue -u $USER
Press Ctrl+C to stop the watch display.
Job Types
There are two main types of jobs you can run on both Innovator and Discovery:
Interactive Jobs — the user requests a node via Slurm and runs commands directly on the command line. Interactive jobs end if the user logs off the cluster. Best for testing, debugging, and short tasks.
Batch Jobs — jobs designed to run one or more scripts without user interaction. The job is submitted to the scheduler using a job submission file (sbatch file). These jobs continue running even if the user logs off. Output goes to a log file instead of the terminal. Best for long running research jobs.
Running an Interactive Job
Interactive jobs are started with the srun command. These examples work on both Innovator and Discovery.
To request one node on the default compute partition:
[john.doe@jacks.local@cllogin002 ~]$ srun --pty bash
[john.doe@jacks.local@node040 ~]$
To request a big memory node:
[john.doe@jacks.local@cllogin002 ~]$ srun --pty -p bigmem bash
[john.doe@jacks.local@bigmem003 ~]$
To request a GPU node with 1 GPU for 1 hour:
[john.doe@jacks.local@cllogin002 ~]$ srun -N 1 -n 40 --time=1:00:00 --partition=gpu --gres=gpu:1 --pty bash
[john.doe@jacks.local@gpu001 ~]$
Running a Batch Job
To run a batch job, write a job submission script containing lines prefixed with #SBATCH that tell Slurm what resources to allocate. This works the same on both Innovator and Discovery — just specify the correct partition for the cluster you are using.
Example batch job script:
#!/bin/bash
#SBATCH --job-name=myjob # Job name
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=4 # CPUs per node (max 48 for all nodes)
#SBATCH --output=log.log # Output log file name
#SBATCH --partition=compute # Partition: see partition tables above
#SBATCH --time=1-00:00:00 # Time limit: days-hours:minutes:seconds
module load <module name>
## Add any additional modules above this line
## Your job commands go below this line
Save the file with a .slurm extension and submit it using:
[john.doe@jacks.local@cllogin002 ~]$ sbatch myjob.slurm
Submitted batch job 334
Cancelling a Job
To cancel a specific job using its job ID:
[john.doe@jacks.local@cllogin002 ~]$ scancel 12243
To cancel all your jobs at once:
[john.doe@jacks.local@cllogin002 ~]$ scancel -u $USER
Common Slurm Commands Reference
For a full list of Slurm commands, refer to the dedicated article: Slurm Cluster Resource Manager Commands
Questions or Problems
If you have any questions or need assistance with job submissions on either Innovator or Discovery, contact the SDSU RCi team: