GPU Computing on Discovery Cluster

Tags Jobstats

Understanding GPU Performance

If a CPU has 96 cores and a GPU has 14,000 cores, you might be tempted to think that a GPU will outperform a CPU for any operation that requires more than 96 parallel instructions. This turns out to be incorrect for a few reasons.

First, in order to use the GPU, the data must be copied from the CPU to the GPU and then later from the GPU back to the CPU. These two transfers take time, which decreases the overall performance. A good GPU programmer will try to minimize this penalty by overlapping computation on the CPU with the data transfers.

Second, to get the maximum performance out of the GPU, you must overload it with operations. It turns out that the threshold number of operations for doing this is an order of magnitude larger than the number of GPU cores. These two reasons explain why the breakeven point for an algorithm carried out on the CPU versus the GPU must be determined empirically through testing with your specific problem.

Precision and Specialized Hardware

As with the CPU, a GPU can perform calculations in single precision faster than in double precision. Single precision uses 32 bits to represent each number, while double precision uses 64 bits. For many scientific applications, single precision provides sufficient accuracy while offering better performance.

Additionally, in recent years, GPU manufacturers have incorporated specialized units called Tensor Cores into their GPUs. These specialized units can perform certain operations in less than single precision, such as half precision, yielding even greater performance. This is particularly beneficial to researchers training artificial neural networks or, more generally, cases where matrix-matrix multiplications and related operations dominate the computation time. Our H100 GPUs on Discovery include these Tensor Cores, making them especially powerful for machine learning and AI workloads.

CPU and GPU
In scientific computing, a GPU is used as an accelerator or a piece of auxiliary hardware that is used in tandem with a CPU to quickly carry out numerically-intensive operations.

Is Your Code GPU-Enabled?

Not all code can use GPUs. Your software must be specifically programmed for GPU acceleration using technologies such as CUDA or OpenCL. Many popular scientific applications have GPU-enabled versions, but not all are currently installed on Discovery by default.

As of November 2025, Discovery provides TensorFlow (2.15, 2.19), PyTorch (2.6), and MATLAB R2024b — all CPU-optimized versions. GPU-enabled builds of these packages, as well as GPU versions of tools like GROMACS and NAMD, can be installed upon request by contacting the HPC team.

Before requesting a GPU, always verify that your software can actually use it. You can inspect module details by running:

module show tensorflow
module show pytorch

Using GPUs on Discovery

To request a GPU on Discovery, add specific directives to your Slurm job script. You’ll need to specify the GPU partition, request the number of GPUs, and ensure you’re loading a GPU-enabled version of your software. A typical job script might request one GPU along with several CPU cores and an appropriate amount of memory.

While your job is running, you can check whether it’s actually using the GPU by running the nvidia-smi command inside your job (it will not work on the login node). This command shows real-time GPU utilization and memory usage. After your job completes, use the jobstats command with your job ID to generate a comprehensive efficiency report that includes GPU utilization statistics.

Common Issues and Best Practices

The most common problem researchers encounter is requesting a GPU but running code that doesn't actually use it. This wastes valuable resources and can lead to longer queue times for your subsequent jobs. Always verify that your code is GPU-enabled before requesting GPU resources.

Another important consideration is problem size. For small datasets, the time spent copying data between CPU and GPU memory can exceed the time saved by faster computation. GPU acceleration typically becomes beneficial when you have large amounts of data and many operations to perform.

When developing GPU code, try to minimize data transfers between the CPU and GPU. Keeping data on the GPU for multiple operations, rather than copying it back and forth repeatedly, can significantly improve performance. Profile your code to understand where time is being spent and whether GPU acceleration is providing the speedup you expect.

Was this helpful?
0 reviews