Effective usage of CPU

HPC Discovery

Using Discovery Cluster Resources Effectively

It is easy to accidentally waste resources on the Discovery cluster. Understanding how to request appropriate resources not only helps your jobs run more efficiently, but also reduces queue times and ensures fair access for all researchers. The guidelines below will help you make the most effective use of Discovery's resources.

Requesting the Right Resources

The more resources you request, the longer your job will spend in the queue waiting for those resources to become available. The scheduler must find a combination of available nodes with enough CPU cores, memory, and GPUs to satisfy your request. If you ask for more than you need, you may wait unnecessarily while a smaller allocation could run immediately. Try to specify your minimum requirements rather than requesting the maximum "just in case."

There are four key resources to consider when submitting a job. First is the number of CPU cores your job actually needs. Second is the amount of time required to run the job. Third is the amount of memory or RAM needed. Fourth is the number of GPUs, if any. For parallel codes that use multiple cores or nodes, you need to carry out a scaling analysis to determine how many cores actually improve performance. Simply requesting more cores does not guarantee better performance.

Time Limits and Queue Priority

The longer the requested run time limit, the longer your queue time tends to be. Jobs with shorter time limits can fit into smaller scheduling windows and often start sooner. Make sure you choose an accurate value for your time limit, but include some extra time for safety since the job will be killed if it does not complete before the run time expires. A good practice is to run a few test jobs to understand how long your work actually takes, then add perhaps 20-30% as a safety margin.

Memory Allocation

For most jobs, the default memory per CPU core will be a reasonable choice. On Discovery, this is typically 2 GB per core. Requesting excess memory can cause your job to spend longer than necessary in the queue, as the scheduler must find nodes with sufficient free memory to accommodate your request. Unless you know your job requires more memory based on previous runs or documentation, start with the defaults and increase only if needed.

Using Nodes Efficiently

You should make every effort to use all of the CPU cores on a node before requesting an additional node. Discovery nodes have 96 cores each, and fragmenting nodes by requesting small numbers of cores across multiple nodes prevents other jobs from running efficiently. You should always specify the number of nodes in your Slurm script since specifying ntasks without specifying the number of nodes may cause your job to be split across nodes unnecessarily.

A properly configured Slurm script might look like this, requesting one full node with 96 cores, or a smaller allocation still on a single node. The key is being explicit about both the number of nodes and the number of tasks.

Understanding Low CPU Efficiency

When you check your job with the jobstats command and see low CPU efficiency, there are several common causes worth investigating.

The most common reason is running a serial code using multiple CPU cores. If your code is not written to run in parallel, requesting multiple cores simply wastes resources. Make sure that your code is actually written to run in parallel before allocating multiple CPU cores. Check the documentation for your software to understand whether it supports parallel execution and how to enable it.

Another common issue is using too many CPU cores for parallel jobs. Not all codes scale efficiently to large numbers of cores, and sometimes requesting fewer cores actually results in faster completion. You can find the optimal number of CPU cores by performing a scaling analysis where you run the same job with different core counts and measure the performance.

Pay attention to where your job writes output files. Actively running jobs should write output to the scratch filesystem, not to slower storage systems. Writing large amounts of data to slow storage during a job can create bottlenecks that make your CPU cores sit idle waiting for I/O operations to complete.

If you installed software using conda, check whether it's using an optimized MPI library. Some software installed through conda is built against generic MPI libraries that are not optimized for HPC systems. You can check this by running conda list after activating your environment and looking for generic libraries that might not perform well on Discovery.

For parallel codes, make sure you're using srun to launch your MPI programs rather than mpirun. The srun command is better integrated with Slurm and will respect your resource allocation more accurately. Consult the documentation or mailing list for your specific software for additional reasons for low CPU efficiency and potential solutions.

Job Priority and Queue Times

The Slurm scheduler bases everything on job priority, so keeping your priority value as high as possible should be of concern. Many factors influence priority, but the most important factor you can control is in your job submission requests. By requesting only what you actually need in terms of cores, nodes, memory, and time, you keep your priority higher and your queue times shorter.

Your recent usage history also affects priority through the fairshare system. If you've been using a large portion of cluster resources recently, your priority will be lower than users who have used less. This ensures fair access to resources over time. The best approach is to use resources efficiently when you do run jobs, which helps both your own priority and the overall cluster performance.

Working with Files and Storage

Make sure that you use the scratch filesystem for job input and output files. This high-performance parallel filesystem is designed for the intensive read and write operations that occur during active jobs. Do not use other storage systems for actively running jobs, as they are much slower and can severely impact your job performance.

Our filesystems perform best with megabyte-sized files that do not exceed hundreds of thousands in number. If you have a large number of small files, consider using tar or similar tools to combine them into single archive files. This improves both performance and helps you stay within filesystem limits. You can check your current file counts and limits by running the checkquota command.

Getting the Most from Discovery

Effective use of Discovery resources comes down to understanding what your jobs actually need and requesting only those resources. Start with modest resource requests, measure performance with jobstats after your jobs complete, and adjust based on what you learn. Over time, you'll develop intuition for how to configure your jobs optimally.

When in doubt, our HPC support team can help you understand your resource usage and suggest ways to improve efficiency. Efficient resource usage benefits everyone by keeping queue times short and ensuring fair access to Discovery's capabilities.

Was this helpful?

0 reviews

Print Article

Updating...

Effective usage of CPU

Using Discovery Cluster Resources Effectively

Deleting...