SDSU HPC Q and A

Slurm HPC Innovator Discovery file-transfer

General HPC Questions

Q1: What HPC clusters are available at SDSU?

SDSU has two HPC clusters: Innovator and Discovery and these clusters are available to all Board of Regents (BOR) institutions including SDSU, USD, SDSMT, DSU, and BHSU. D

Q2: What is the difference between Innovator and Discovery?

Innovator has 46 compute nodes, 4 big memory nodes, 14 GPU nodes with NVIDIA A100 80GB GPUs, and 3 PB of storage. Discovery and has a stronger GPU focus with NVIDIA H100 80GB GPUs, including large GPU nodes with 4 GPUs and 1 TB RAM each. Discovery has 200 Gbps Infiniband while Innovator has 100 Gbps.

Q3: How do I request access to the HPC clusters?

Complete the RCi HPC onboarding form at: https://help.sdstate.edu/TDClient/2744/Portal/Requests/TicketRequests/NewForm?ID=BJcNDsievG4_&RequestorType=Service. Once submitted, the RCi team will reach out and schedule a quick onboarding meeting if needed.

Q4: Who can use Innovator?

Innovator is available to all Board of Regents institutions: South Dakota State University (SDSU), University of South Dakota (USD), South Dakota School of Mines and Technology (SDSMT), Dakota State University (DSU), and Black Hills State University (BHSU).

Q5: Who can use Discovery?

Discovery is available to all Board of Regents institutions but SDSU users are given priority on this resource.

Innovator Cluster

Q6: What are the hardware specifications of Innovator?

Innovator consists of: 46 Compute Nodes (Dell PowerEdge R650, 2x Intel Xeon Gold 6342 @ 2.80GHz, 48 cores, 256 GB RAM), 4 Big Memory Nodes (Dell PowerEdge R750, 48 cores, 2 TB RAM expandable to 4 TB), and 14 GPU Nodes (Dell PowerEdge R750, 48 cores, 512 GB RAM, 2x NVIDIA A100 80GB per node). Total: 3,072 CPU cores.

Q7: What storage is available on Innovator?

Innovator is attached to a 3 PB Arcastream Pixstor GPFS parallel filesystem: 2 PB usable research storage, 512 TB Flash Tier for faster read/write speeds, and 512 TB for RCi software. Each user gets 100 GB home directory quota. Scratch storage is available on request with no quota but a data expiration policy will apply.

Q8: What are the partitions on Innovator?

Innovator has 4 partitions: compute (46 nodes, 256 GB RAM, 14-day limit, default), bigmem (4 nodes, 2 TB RAM, 14-day limit), gpu (14 nodes, 512 GB RAM, 2x NVIDIA A100 80GB, 14-day limit), and quickq (46 nodes, 256 GB RAM, 12-hour limit for short/test jobs).

Q9: What GPU is available on Innovator?

Innovator GPU nodes have 2x NVIDIA A100 80GB cards per node across 14 GPU nodes. Use the gpu partition to access these resources with --gres=gpu:1 or --gres=gpu:2 in your job script.

Q10: What is the home directory path on Innovator?

For SDSU users the home directory is /home/jacks.local/username. For SDSMT users it is /home/SDSMT.LOCAL/username (case sensitive). The scratch directory follows the same format: /scratch/jacks.local/username.

Q11: What operating system does Innovator run?

Innovator runs on Rocky 9 Linux.

Q12: What is the network speed on Innovator?

Innovator uses 100 Gbps Infiniband for cluster data application processing and science data transfers, and 1 Gbps for cluster management.

Discovery Cluster

Q13: What are the hardware specifications of Discovery?

Discovery consists of: 10 Compute Nodes (Dell PowerEdge R650, 48 cores, 256 GB RAM), 2 Big Memory Nodes (Dell PowerEdge R750, 48 cores, 2 TB RAM), 5 Standard GPU Nodes (Dell PowerEdge R760xa, 48 cores, 512 GB RAM, 2x NVIDIA H100 80GB), and 2 Large GPU Nodes (Dell PowerEdge XE8640, 48 cores, 1 TB RAM, 4x NVIDIA H100 80GB SXM4). Total: 1,000 CPU cores with a primary focus on GPU resources.

Q14: What are the partitions on Discovery?

Discovery has 3 partitions: compute (10 nodes including 2 big memory, 256 GB RAM, 14-day limit, default), gpu (5 standard GPU nodes, 512 GB RAM, 2 GPUs per node, 14-day limit), and all-gpu (7 nodes including large GPU nodes lg001 and lg002 with 4 GPUs and 1 TB RAM each, 14-day limit).

Q15: What GPU is available on Discovery?

Discovery has NVIDIA H100 80GB GPUs. Standard GPU nodes (g001-g005) have 2x H100 80GB each. Large GPU nodes (lg001 and lg002) have 4x H100 80GB SXM4 each with 1 TB RAM. Use the gpu partition for standard GPU jobs or all-gpu partition to access all GPU nodes including the large ones.

Q16: What storage is available on Discovery?

Discovery is attached to a 1.6 PB RAW Arcastream Pixstor GPFS parallel filesystem. Each user gets 100 GB home directory quota. Scratch storage is available on request with no quota but a Scratch Data Retention Schedule will be applied.

Q17: What is the home directory path on Discovery?

For SDSU users the home directory is /home/jacks.local/username. The scratch directory is /scratch/jacks.local/username. Directory paths are case sensitive.

Q18: What operating system does Discovery run?

Discovery runs on RHEL 9 Linux.

Q19: What is the network speed on Discovery?

Discovery uses 200 Gbps Infiniband for cluster data application processing and science data transfers, and 1 Gbps for cluster management.

Q20: What is the difference between the gpu and all-gpu partitions on Discovery?

The gpu partition includes only the 5 standard GPU nodes (g001-g005) each with 2x NVIDIA H100 80GB and 512 GB RAM. The all-gpu partition includes all 7 GPU nodes including the 2 large GPU nodes (lg001 and lg002) which have 4x NVIDIA H100 80GB SXM4 and 1 TB RAM each. Use all-gpu when you need the large GPU nodes.

Logging into the Clusters

Q21: How do I log into Innovator via SSH?

Use SSH with your username in this format:

ssh john.doe@jacks.local@innovator.sdstate.edu

Replace john.doe with your first.last name. For students use jdoe@jacks.local format. You can use any SSH client including MobaXterm, PuTTY, or the terminal on Mac/Linux.

Q22: How do I log into Discovery via SSH?

Use SSH with your username in this format:

ssh john.doe@jacks.local@discovery.sdstate.edu

Replace john.doe with your first.last name. For students use jdoe@jacks.local format.

Q23: How do I log in using MobaXterm?

Open MobaXterm, click Session, select SSH. For Innovator enter hostname: innovator.sdstate.edu. For Discovery enter: discovery.sdstate.edu. Check Specify username and enter first.lastname@jacks.local. Click OK and enter your password when prompted. Passwords do not display while typing — type correctly and press Enter.

Q24: How do I log in using PuTTY?

Open PuTTY. In the Host Name field enter john.doe@jacks.local@innovator.sdstate.edu for Innovator or john.doe@jacks.local@discovery.sdstate.edu for Discovery. Set Port to 22 and Connection type to SSH. Click Open, accept the security prompt, and enter your password.

Q25: How do I access Innovator via Open OnDemand?

Open your browser and go to https://ondemand.sdstate.edu. Enter your email as first.lastname@jacks.sdstate.edu and your password, then click Sign In.

Q26: How do I access Discovery via Open OnDemand?

Open your browser and go to https://mydiscovery.sdstate.edu. Enter your email as first.lastname@jacks.sdstate.edu and your password, then click Sign In.

Q27: I am from USD, what domain do I use to log in?

USD users use @usd.local. For example: ssh jane.doe@usd.local@innovator.sdstate.edu.

Q28: I am from SDSMT, what domain do I use?

SDSMT users use @SDSMT.LOCAL (case sensitive). For example: ssh jane.doe@SDSMT.LOCAL@innovator.sdstate.edu.

Q29: What domains do BOR institutions use to log in?

SDSU uses @jacks.local, USD uses @usd.local, SDSMT uses @SDSMT.LOCAL (case sensitive), DSU uses @dsu.local, and BHSU uses @blackhills.local.

Q30: I cannot log in to the cluster, what should I do?

Check the following: confirm your username format is correct, ensure you are using the correct domain for your institution, verify you are connecting to the correct hostname, ensure your password is correct, and remember passwords do not display while typing in SSH terminals. If problems persist contact SDSU.HPC@sdstate.edu or submit a request at https://help.sdstate.edu/TDClient/2744/Portal/Requests/ServiceDet?ID=53689.

Q31: I am a new user, how do I get started with the HPC cluster?

Complete the onboarding form at https://help.sdstate.edu/TDClient/2744/Portal/Requests/TicketRequests/NewForm?ID=BJcNDsievG4_&RequestorType=Service. After approval access the cluster via SSH using MobaXterm or PuTTY, or via Open OnDemand at ondemand.sdstate.edu for Innovator or mydiscovery.sdstate.edu for Discovery.

Slurm Job Submission

Q32: What is Slurm?

Slurm (Simple Linux Utility for Resource Management) is an open-source job scheduler used on both Innovator and Discovery HPC clusters at SDSU. It allocates resources on worker nodes, allows users to submit and monitor jobs, and balances job submissions across all users. All computationally intensive work must be submitted through Slurm — never run heavy jobs directly on the login node.

Q33: How do I submit a job on the cluster?

Write a job submission script with #SBATCH directives, save it as myjob.slurm, then submit using:

sbatch myjob.slurm

Q34: How do I check the status of my jobs?

squeue -u $USER                # View only your jobs
squeue                        # View all jobs
watch -n 30 squeue -u $USER   # Monitor every 30 seconds

Job states: R means running, PD means pending waiting for resources. Press Ctrl+C to stop watch.

Q35: How do I cancel a job?

scancel <job_id>      # Cancel a specific job
scancel -u $USER      # Cancel all your jobs

Q36: How do I run an interactive job?

srun --pty bash                                          # Compute node
srun --pty -p bigmem bash                                # Big memory node
srun -N 1 -n 40 --time=1:00:00 --partition=gpu --gres=gpu:1 --pty bash   # GPU node

Interactive jobs end when you log off. Use batch jobs for long running work.

Q37: Which partition should I use for my job?

On Innovator: use compute for general jobs (default), bigmem if you need more than 256 GB memory, gpu for GPU/ML jobs with NVIDIA A100 GPUs, quickq for short test jobs under 12 hours. On Discovery: use compute for general jobs (default), gpu for standard GPU jobs with NVIDIA H100 GPUs, all-gpu to access all GPU nodes including large nodes with 4 GPUs and 1 TB RAM.

Q38: How do I submit a GPU job?

Add these lines to your job script:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1    # Use gpu:2 for 2 GPUs

On Innovator GPU nodes have NVIDIA A100 80GB. On Discovery use gpu partition for NVIDIA H100 80GB nodes or all-gpu to include large GPU nodes.

Q39: What is the maximum time limit for jobs?

On both Innovator and Discovery most partitions allow up to 14 days (14-00:00:00). The quickq partition on Innovator has a 12-hour limit. Use quickq for testing and short jobs as they typically start faster.

Q40: How many CPUs can I request per node?

All nodes on both Innovator and Discovery have 48 CPUs per node. Set #SBATCH --ntasks-per-node up to 48 for maximum CPU usage per node.

Q41: How do I view available partitions and node status?

sinfo                    # View all partitions and node states
sinfo -o "%P %l %m"    # View partition time limits and memory

Node states: idle = all resources available, mix = partially used, alloc = fully used.

Q42: How do I submit a Slurm job array?

#SBATCH --array=0-4694%25   # Submit all jobs, run 25 at a time

Use $SLURM_ARRAY_TASK_ID in your script to process each item. Always throttle large arrays using % to avoid overwhelming the scheduler. Load modules after all #SBATCH lines. Set a realistic --time for each individual task.

Q43: Why is my job not starting?

Common reasons: requested resources are not available (nodes fully allocated), time limit exceeds partition limit, requested more CPUs or memory than available per node, or high cluster utilization. Check squeue -u $USER to see your job status and reason in parentheses. Check sinfo to see node availability. Contact SDSU.HPC@sdstate.edu if the job remains pending unusually long.

Q44: How do I request memory for my job?

#SBATCH --mem=32G          # 32 GB total per node
#SBATCH --mem-per-cpu=8G   # 8 GB per CPU core

Do not request more memory than available: 256 GB for compute, 2 TB for bigmem, 512 GB for gpu on Innovator.

Q45: Where does my job output go?

By default output goes to slurm-<jobid>.out in the directory where you submitted the job. Specify a custom file with #SBATCH --output=mylog.log. Use %j to include the job ID: #SBATCH --output=myjob_%j.log. Check this file if your job fails.

Writing a Slurm Job Script — Line by Line Guide

Q46: What does #!/bin/bash mean in a Slurm script?

#!/bin/bash is called a shebang line. It must always be the very first line of your script. It tells the system to use the Bash shell to run this script. Without this line your script may not execute correctly.

Q47: What does #SBATCH --job-name do?

Gives your job a name that appears in the queue when you run squeue. Choose a short descriptive name. Example: #SBATCH --job-name=myjob. If not set, Slurm uses the script filename.

Q48: What does #SBATCH --nodes do?

Specifies how many compute nodes your job needs. Most jobs only need 1 node. Only set this higher if your software is designed to run across multiple nodes using MPI. Example: #SBATCH --nodes=1

Q49: What does #SBATCH --ntasks-per-node do?

Specifies how many CPU cores to use per node. The maximum on all nodes on both Innovator and Discovery is 48. Start with a smaller number like 4 or 8 unless your job is specifically designed to use many cores. Example: #SBATCH --ntasks-per-node=4

Q50: What does #SBATCH --output do?

Specifies the name of the log file where your job output will be saved. Example: #SBATCH --output=myjob.log. Use %j to include the job ID automatically: #SBATCH --output=myjob_%j.log

Q51: What does #SBATCH --partition do?

Tells Slurm which group of nodes to run your job on. On Innovator choose from: compute (default), bigmem, gpu, quickq. On Discovery choose from: compute (default), gpu, all-gpu. Example: #SBATCH --partition=compute

Q52: What does #SBATCH --time do?

Sets the maximum time your job is allowed to run. If exceeded the job is automatically cancelled. Format is days-hours:minutes:seconds. Examples:

#SBATCH --time=1-00:00:00   # 1 day
#SBATCH --time=8:00:00      # 8 hours
#SBATCH --time=0-01:30:00   # 1 hour 30 minutes

Set a reasonable estimate — do not always set 14 days if your job only needs a few hours.

Q53: What does #SBATCH --mem do?

Specifies the total memory your job needs per node. Example: #SBATCH --mem=32G. Do not request more than the node has: 256 GB for compute, 2 TB for bigmem, 512 GB for gpu nodes on Innovator.

Q54: What does #SBATCH --gres=gpu:1 do?

Requests 1 GPU for your job. You must include this when using the gpu or all-gpu partition, otherwise your job will not have access to any GPU. Use --gres=gpu:2 to request both GPUs on a node. Must be combined with --partition=gpu.

Q55: What does #SBATCH --array do?

Submits multiple similar jobs as a single job array. Example: #SBATCH --array=0-9 submits 10 jobs with indices 0 through 9. Use $SLURM_ARRAY_TASK_ID to reference each index. Always throttle: #SBATCH --array=0-999%20 runs 1000 jobs but only 20 at a time.

Q56: Can you show me a complete basic Slurm script with explanation?

#!/bin/bash                   # Always first line — use Bash shell
#SBATCH --job-name=myjob      # Name your job
#SBATCH --nodes=1             # Request 1 compute node
#SBATCH --ntasks-per-node=4   # Use 4 CPU cores (max 48)
#SBATCH --mem=16G             # Request 16 GB memory
#SBATCH --output=myjob_%j.log # Save output to log file
#SBATCH --partition=compute   # Run on compute partition
#SBATCH --time=1-00:00:00     # Allow up to 1 day
 
module load python/3.11        # Load software (AFTER all #SBATCH lines)
 
python myscript.py             # Your actual job command

Save as myjob.slurm and submit with: sbatch myjob.slurm

Q57: Can you show me a complete GPU Slurm script with explanation?

#!/bin/bash                    # Always first line
#SBATCH --job-name=gpu_job     # Name your job
#SBATCH --nodes=1              # Request 1 GPU node
#SBATCH --ntasks-per-node=8    # Use 8 CPU cores
#SBATCH --mem=32G              # Request 32 GB memory
#SBATCH --output=gpu_%j.log    # Save output to log file
#SBATCH --partition=gpu        # Use GPU partition
#SBATCH --gres=gpu:1           # Request 1 GPU (required!)
#SBATCH --time=8:00:00         # Allow up to 8 hours
 
module load cuda/11.8           # Load CUDA for GPU computing
module load python/3.11         # Load Python
 
python train_model.py           # Your GPU training command

On Innovator: NVIDIA A100 80GB. On Discovery: NVIDIA H100 80GB. Save as gpu_job.slurm and submit with: sbatch gpu_job.slurm

Q58: How do I write a Slurm script for a job array?

#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=16G
#SBATCH --output=logs/%A_%a.log   # %A = array job ID, %a = task index
#SBATCH --partition=compute
#SBATCH --time=2:00:00
#SBATCH --array=0-99%10            # 100 jobs, 10 at a time
 
module load python/3.11
 
echo "Processing task: $SLURM_ARRAY_TASK_ID"
python process_data.py --index $SLURM_ARRAY_TASK_ID

Create the logs folder first: mkdir -p logs. Then submit with: sbatch array_job.slurm

Q59: What are the most common mistakes when writing a Slurm script?

Missing #!/bin/bash on the first line
Loading modules before the #SBATCH lines — modules must always come AFTER all #SBATCH directives
Requesting more CPUs than available — max is 48 per node
Not including --gres=gpu:1 when using the gpu partition
Setting --time too short so the job gets cancelled before finishing
Using the wrong partition name — on Innovator use compute/bigmem/gpu/quickq, on Discovery use compute/gpu/all-gpu
Requesting more memory than the node has available
Copying scripts from email or documents where quote characters get changed and cause syntax errors

Software Modules

Q60: How do I find available software on the cluster?

module avail

Q61: How do I load a software module?

module load python/3.11

Load modules after all #SBATCH lines in your job scripts.

Q62: How do I check which modules are currently loaded?

module list

Q63: How do I unload a module?

module unload python/3.11

Q64: The software I need is not available as a module, what do I do?

Submit a software request at https://help.sdstate.edu/TDClient/2744/Portal/Requests/ServiceDet?ID=53689. The RCi team will work with you to get the application installed on the cluster.

Open OnDemand

Q65: What is Open OnDemand?

Open OnDemand is a browser-based graphical interface for accessing HPC clusters without needing an SSH client. Through OnDemand you can open a web terminal, submit and monitor Slurm jobs, browse and manage files, launch interactive applications like Jupyter Notebooks and RStudio, and monitor cluster resources.

Q66: What is the URL for Innovator Open OnDemand?

Innovator Open OnDemand is accessible at https://ondemand.sdstate.edu. Log in with your email as first.lastname@jacks.sdstate.edu and your SDSU password.

Q67: What is the URL for Discovery Open OnDemand?

Discovery Open OnDemand is accessible at https://mydiscovery.sdstate.edu. Log in with your email as first.lastname@jacks.sdstate.edu and your SDSU password.

Q68: What applications can I launch through Open OnDemand?

Through Open OnDemand you can launch Jupyter Notebooks, RStudio sessions, and other web-based interactive applications. You can also access a web-based terminal, submit batch jobs, and manage your files directly in the browser.

Q69: Can I submit Slurm jobs through Open OnDemand?

Yes. Open OnDemand provides a job submission interface where you can submit, monitor, and manage Slurm jobs without using the command line. You can also view job status and cancel jobs through the web interface.

File Transfer

Q70: How do I transfer files to the cluster?

You can transfer files using SCP:

scp localfile.txt john.doe@jacks.local@innovator.sdstate.edu:/home/jacks.local/john.doe/

You can also use Globus for large data transfers, or the file manager in Open OnDemand for smaller files.

Q71: How do I use Globus to transfer data?

Create a Globus account at globus.org, install Globus Connect Personal on your local computer, then use the Globus web interface to transfer files between your computer and the cluster. Globus is recommended for large dataset transfers as it handles interruptions automatically.

Q72: Where should I store large datasets on the cluster?

Store large datasets in your scratch directory at /scratch/jacks.local/username. The scratch directory has no quota but is not backed up and a data expiration policy will be applied. Your home directory has a 100 GB quota and is intended for important persistent files.

Support and Contact

Q73: How do I contact HPC support?

Email: SDSU.HPC@sdstate.edu
Phone: 605-688-6776
Support form: https://help.sdstate.edu/TDClient/2744/Portal/Requests/ServiceDet?ID=53689

Q74: My job failed, how do I get help?

First check your job output log file (slurm-<jobid>.out or the file specified with --output). Look for error messages. Common issues include wrong module names, incorrect file paths, or insufficient memory requests. If you cannot resolve it contact SDSU.HPC@sdstate.edu with your job ID and the error message.

Q75: How do I check my storage usage?

Storage and Data Backup

Q76: Is my home directory data backed up?

Yes. Your home directory on both Innovator and Discovery is backed up regularly. Store all important scripts, code, results, and research data in your home directory to ensure it is protected. The home directory path for SDSU users is /home/jacks.local/username on both clusters.

Q77: Is scratch storage backed up?

No. Scratch storage is not backed up. Do not store data you cannot afford to lose in your scratch directory. Scratch is intended for temporary use only — store large intermediate files and job input/output there during a job, then copy important results back to your home directory or transfer them off-cluster. A data expiration policy will eventually be applied to scratch.

Q78: What is the storage quota on my home directory?

Each user has a 100 GB quota on their home directory on both Innovator and Discovery. Scratch storage has no quota limit but is not backed up. If you need a quota increase or additional scratch space contact SDSU.HPC@sdstate.edu.

Q79: How do I check my storage usage?

df -h /home     # Check home directory usage
df -h /scratch   # Check scratch usage
du -sh ~/        # Check your own home directory size

Q80: What is the best practice for managing data on the cluster?

Keep scripts, code, and important results in your home directory (backed up, 100 GB quota)
Use scratch for large temporary files during job runs (not backed up, no quota)
After a job completes, copy important output from scratch to home or transfer off-cluster using Globus
Regularly clean up scratch to avoid hitting any future expiration policy
For very large datasets consider using Globus to transfer to external storage

Q81: Does SDSU HPC provide long-term data archiving?

Long-term archiving is not currently provided automatically. Users are responsible for managing their own data. For questions about data retention or archiving options contact SDSU.HPC@sdstate.edu.

Conda and Virtual Environments

Q82: What is a conda environment and why should I use one?

A conda environment is an isolated Python environment that lets you install specific versions of packages without affecting the system Python or other users on the cluster. It is the recommended way to manage Python dependencies for research projects. This way you can have different projects with different package requirements running side by side without conflicts.

Q83: How do I create a conda environment on the cluster?

First load the Anaconda module, then create your environment:

module load anaconda
conda create --name myenv python=3.11

Replace myenv with your preferred environment name and 3.11 with the Python version you need.

Q84: How do I activate and deactivate a conda environment?

# Activate
conda activate myenv
 
# Deactivate when done
conda deactivate

Your terminal prompt will show (myenv) when the environment is active.

Q85: How do I install packages in my conda environment?

# Activate first
conda activate myenv
 
# Install using conda
conda install numpy pandas scipy matplotlib
 
# Install using pip
pip install somepackage

Q86: How do I list all my conda environments?

conda env list

Q87: How do I remove a conda environment I no longer need?

conda env remove --name myenv

Q88: How do I use my conda environment in a Slurm job script?

Always load Anaconda and activate your environment inside the job script:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --partition=compute
#SBATCH --time=1-00:00:00
 
module load anaconda
conda activate myenv
 
python myscript.py

Q89: How do I use my conda environment as a kernel in Jupyter on Open OnDemand?

You must install the ipykernel package inside your conda environment. Without this step your environment will not appear as a kernel option in Jupyter.

conda activate myenv
conda install ipykernel
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

After running these commands, launch Jupyter Notebook from Open OnDemand. Your environment will appear as Python (myenv) in the kernel selection menu.

Q90: Can I use Python venv instead of conda?

Yes. Python's built-in venv module works on the clusters as an alternative to conda:

# Load Python first
module load python/3.11
 
# Create the environment
python3 -m venv ~/myproject/venv
 
# Activate
source ~/myproject/venv/bin/activate
 
# Install packages
pip install numpy pandas
 
# Deactivate
deactivate
 
# Remove when no longer needed
rm -rf ~/myproject/venv

Available Software and Requesting New Software

Q91: What software is available on Innovator?

Innovator has a large library of research software. Key applications by category:

Programming Languages: Python 2.7/3.8/3.11/3.12, R 4.3/4.4, Perl 5.38, Go 1.22, Ruby 3.1, Node.js 22
Machine Learning / AI: PyTorch 2.1/2.2, TensorFlow 2.14/2.16, AlphaFold 2.3/3.0, DeepChem, Ollama
CUDA / GPU: CUDA 11.8/12.1/12.3, cuDNN 8.6/8.9/9.0
Bioinformatics: BLAST 2.14, SAMtools 1.19, BWA 0.7, GATK 4.6, FastQC 0.12, HISAT2 2.2, Trinity 2.15, BUSCO 5.6, Augustus 3.5
Chemistry / MD: AMBER 24, GROMACS 2019/2023, LAMMPS 2023, OpenMM 8.1, CP2K 2023, ORCA 5.0, VASP 6.4, Quantum ESPRESSO 7.3
Engineering: ANSYS 2025R1, COMSOL 6.3, OpenFOAM 23.06, STAR-CCM+ 20.06, HEC-HMS 4.13, MATLAB R2023b
Visualization: ParaView 5.13, VisIt 3.4, PyMOL 3.0

Run module avail on the cluster for the complete current list.

Q92: What software is available on Discovery?

Discovery has a smaller but focused software selection:

Programming Languages: Python 3.11/3.12, R 4.4, Go 1.26
Machine Learning / AI: PyTorch 2.6, TensorFlow 2.15/2.19, Evo2 2.0
CUDA / GPU: CUDA 12.2/12.6, cuDNN 12/8.9
Bioinformatics: QIIME2 amplicon/metagenome 2025, Haploview 4.2
Engineering: OpenFOAM 25.06, STAR-CCM+ 19/20, MATLAB R2024b
MPI / Parallel: OpenMPI 5.0.9, OneAPI 2025, Apptainer, Singularity

If you need software available on Innovator but not Discovery, consider running your job on Innovator instead. Run module avail on Discovery for the complete current list.

Q93: How do I search for a specific software on the cluster?

# Search for a specific software by name
module spider python
 
# See all available versions of a software
module avail python

Q94: How do I request new software to be installed on the cluster?

Submit a software installation request at: https://help.sdstate.edu/TDClient/2744/Portal/Requests/TicketRequests/NewForm?ID=nxDTvQ6i6LE_&RequestorType=Service

When filling out the request include the following information:

Software name and version — e.g., Python 3.12, GROMACS 2024.1
Official website or source URL — link to the software's homepage or repository
Research purpose — brief description of what you will use it for
Known dependencies — any related modules or libraries required
License information — whether the software requires a license and if you have one
Urgency — whether you have a deadline or time-sensitive need

Interactive Applications on Open OnDemand

Q95: How do I run RStudio on Open OnDemand?

RStudio is available as an interactive application on both Innovator and Discovery through Open OnDemand. No SSH or command line needed.

Go to https://hpcportal.sdstate.edu for Innovator or https://mydiscovery.sdstate.edu for Discovery
Log in with your SDSU credentials
Click Interactive Apps in the top navigation menu
Click RStudio
Fill in the job parameters: partition, number of hours, number of cores, memory
Click Launch
Wait for the job to start — the button will turn green showing Connect to RStudio
Click Connect to RStudio to open RStudio in your browser

Q96: How do I run Jupyter Notebook on Open OnDemand?

Go to https://hpcportal.sdstate.edu for Innovator or https://mydiscovery.sdstate.edu for Discovery
Click Interactive Apps then Jupyter Notebook
Select your partition, hours, cores, and memory
Click Launch
When the session starts click Connect to Jupyter
In Jupyter select your kernel — if you created a conda environment with ipykernel it will appear here

Q97: Can I run AI and machine learning workloads on the cluster?

Yes. Both clusters support AI and ML workloads. For best performance use the GPU partitions:

On Innovator use the gpu partition — NVIDIA A100 80GB GPUs
On Discovery use the gpu or all-gpu partition — NVIDIA H100 80GB GPUs

Available ML frameworks include PyTorch, TensorFlow, AlphaFold, DeepChem, and Ollama. Load the appropriate module and submit your job to the gpu partition with --gres=gpu:1. You can also run Jupyter Notebooks with GPU access through Open OnDemand by selecting the gpu partition when launching the session.

Login Node and Cluster Policies

Q98: What is the login node and what can I use it for?

The login node is the server you connect to when you SSH into the cluster (cllogin002 on Innovator, discovery001 on Discovery). It is a shared resource used by all users simultaneously. You can use it for editing files, writing scripts, compiling small programs, submitting jobs, and light tasks. You should NOT run computationally intensive work on the login node.

Q99: What happens if I run heavy jobs on the login node?

Running computationally intensive jobs directly on the login node slows down the system for all other users. The HPC team monitors login node usage and will kill processes that consume excessive resources without warning. Always submit intensive work through Slurm using sbatch for batch jobs or srun for interactive jobs on compute nodes.

Q100: Does SDSU charge for using the HPC clusters?

There is currently no direct charge to users for using Innovator or Discovery. The clusters are funded and maintained by SDSU Research Cyberinfrastructure and are available at no cost to researchers and students at all Board of Regents institutions. Contact SDSU.HPC@sdstate.edu for questions about resource allocation or priority access.

Q101: Do both clusters run Linux? Can I run Windows software?

Both Innovator and Discovery run Rocky Linux 9. There are no Windows compute systems on either cluster. Windows software cannot run directly on the clusters unless it has a Linux version or can run inside a container using Singularity or Apptainer. Your local computer can be Windows, Mac, or Linux — you connect to the clusters remotely using SSH or Open OnDemand. If you need to run Windows-only software contact SDSU.HPC@sdstate.edu to discuss alternatives.

Off-Campus Access and VDI

Q102: Can I access the HPC clusters from off campus?

Yes. You can access the clusters from off campus using two methods:

Open OnDemand (recommended): Access Innovator at https://hpcportal.sdstate.edu or Discovery at https://mydiscovery.sdstate.edu directly from any browser without VPN
SSH via VDI: Use SDSU's Virtual Desktop Infrastructure (VDI) at https://cloudapps.sdstate.edu to get a virtual desktop with SSH clients like MobaXterm and PuTTY pre-installed

Q103: What is Cloudapps and VDI?

Cloudapps (https://cloudapps.sdstate.edu) is SDSU's Virtual Desktop Infrastructure (VDI) platform. It provides researchers with a virtual desktop accessible from any browser that includes commonly used software such as VNC Viewer, PuTTY, MobaXterm, and Office applications. It allows SSH and RDP access to research systems from off campus without needing a VPN. Sign in using the Omnissa Horizon client or directly in the browser.

Q104: Do I need VPN to access the clusters from off campus?

No VPN is required. You can access the clusters from off campus using Open OnDemand directly in your browser, or through VDI at https://cloudapps.sdstate.edu. Contact the SDSU Support Desk at 605-688-6776 or sdsu.supportdesk@sdstate.edu if you need help setting up VDI access or if you encounter issues connecting from off campus.

Non-Cluster Research Systems

Q105: Are there research computing systems at SDSU besides Innovator and Discovery?

Yes. SDSU Research Cyberinfrastructure maintains several standalone research systems in addition to the HPC clusters. These systems are accessible via DCV (NICE Desktop Cloud Visualization) and require VPN access or Cloudapps to use from off campus. To request access contact SDSU.HPC@sdstate.edu.

Was this helpful?

0 reviews

Print Article

Updating...