Julia on HPC systems

The purpose of this repository is to document best practices for running Julia on HPC systems (i.e., "supercomputers"). At the moment, both information relevant for supercomputer operators as well as users is collected here. There is no guarantee for permanence or that information here is up-to-date, neither for a useful ordering and/or categorization of issues.

For operators

Official Julia binaries vs. building from source

According to this Discourse post, the difference between compiling Julia from source with architecture-specific optimization and using the official Julia binaries is negligible. This has been confirmed by Ludovic Räss for an Nvidia DGX-1 system at CSCS, where also no performance differences between a Spack-installed version and the official binaries were found (April 2022).

Since installing from source using, e.g., Spack, can sometimes be cumbersome, the general recommendation is to go with the pre-built binaries unless benchmarked and found to be different. This is also the current approach taken at NERSC, CSCS, and PC2.

In June 2022, a new Julia PR was created (JuliaLang/julia#45641) that aims to add PGO (profile-guided optimization) and LTO (link-time optimization) to the Julia Makefile. Depending on the test, compilation time improvements of up to 30% have been reported, so it might be worth checking out once merged. The performance of the compiled Julia code is unaffected though.

Last update: June 2022

Ensure correct libraries are loaded

When using Julia on a system that uses an environment-variable based module system (such as modules or Lmod), the LD_LIBRARY_PATH variable might be filled with entries pointing to different packages and libraries. To avoid issues from Julia loading another library instead of the ones packaged with Julia, make sure that Julia's lib directory is always the first directory in LD_LIBRARY_PATH.

One possibility to achieve this is to create a wrapper shell script that modifies LD_LIBRARY_PATH before calling the Julia executable. Inspired by a script from UCL's Owain Kenway:

#!/usr/bin/env bash

# This wrapper makes sure the julia binary distributions picks up the GCC
# libraries provided with it correctly meaning that it does not rely on
# the gcc-libs version.

# Dr Owain Kenway, 20th of July, 2021
# Source: https://github.com/UCL-RITS/rcps-buildscripts/blob/04b2e2ccfe7e195fd0396b572e9f8ff426b37f0e/files/julia/julia.sh

location=$(readlink -f $0)
directory=$(readlink -f $(dirname ${location})/..)

export LD_LIBRARY_PATH=${directory}/lib/julia:${LD_LIBRARY_PATH}
exec ${directory}/bin/julia "$@"

Note that using readlink might not be optimal from a performance perspective if used in a massively parallel environment. Alternatively, hard-code the Julia path or set an environment variable accordingly.

Also note that fixing the LD_LIBRARY_PATH variable does not seem to be a hard requirement, since it is not used universally (e.g., it is not necessary on NERSC's systems).

Last update: April 2022

Julia depot path

Since the available file systems can differ significantly between HPC centers, it is hard to make a general statement about where the Julia depot folder (by default on Unix-like systems: ~/.julia) should be placed (via JULIA_DEPOT_PATH). Generally speaking, the file system hosting the Julia depot should have

good (parallel) I/O
no tight quotas
read and write access
no mechanism for the automatic deletion of unused files (or the depot should be excluded as an exception)

On some systems, it resides in the user's home directory (e.g. at NERSC). On other systems, it is put on a parallel scratch file system (e.g. CSCS and PC2). At the time of writing (April 2022), there does not seem to be reliable performance data available that could help to make a data-based decision.

If multiple platforms, e.g., systems with different architecture, would access the same Julia depot, for example because the file system is shared, it might make sense to create platform-dependend Julia depots by setting the JULIA_DEPOT_PATH environment variable appropriately, e.g.,

prepend-path JULIA_DEPOT_PATH $env(HOME)/.julia/$platform

where $platform contains the current system name (source).

MPI.jl

It is generally recommended to set

JULIA_MPI_BINARY=system

such that MPI.jl will always use a system MPI instead of the Julia artifact (i.e. MPI_jll.jl). For more configuration options see this part of the MPI.jl documentation.

Additionally, on the NERSC systems, there is a pre-built MPI.jl for each programming environment, which is loaded through a settings module. More information on the NERSC module file setup can be found here.

CUDA.jl

It seems to be generally advisable to set the environment variables

JULIA_CUDA_USE_BINARYBUILDER=false
JULIA_CUDA_USE_MEMORY_POOL=none

in the module files when loading Julia on a system with GPUs. Otherwise, Julia will try to download its own BinaryBuilder.jl-provided CUDA stack, which is typically not what you want on a production HPC system. Instead, you should make sure that Julia finds the local CUDA installation by setting relevant environment variables (see also the CUDA.jl docs). Disabling the memory pool is advisable to make CUDA-aware MPI work on multi-GPU nodes (see also the MPI.jl docs).

Modules file setup

Johannes Blaschke provides scripts and templates to set up modules file for Julia on some of NERSC's systems:
https://gitlab.blaschke.science/nersc/julia/-/tree/main/modulefiles

There are a number of environment variables that should be considered to be set through the module mechanism:

JULIA_DEPOT_PATH: Ensure depot path is on the correct file system
JULIA_MPI_BINARY: Use system-provided MPI backend
JULIA_CUDA_USE_BINARYBUILDER: Use system-provided CUDA stack
JULIA_CUDA_USE_MEMORY_POOL: Make CUDA-aware MPI work

Easybuild resources

Samuel Omlin and colleagues from CSCS provide their Easybuild configuration files used for Piz Daint online at https://github.com/eth-cscs/production/tree/master/easybuild/easyconfigs/j/Julia. For example, there are configurations available for Julia 1.7.2 and for Julia 1.7.2 with CUDA support. Looking at these files also helps to decide which kind of environment variables are useful to set.

Further resources

There is a lengthy discussion on the Julia Discourse about how to set up a centralized Julia installation. Some of it is already dated (probably), but it gives a good overview of some best practices and about approaches that work (and some which do not). In particular, the summary from CSCS is very helpful:
https://discourse.julialang.org/t/how-does-one-set-up-a-centralized-julia-installation/13922/32
NERSC's Johannes Blaschke has a nice repository set up with lots of scripts and helpful information on setting up Julia on Cori and Perlmutter:
https://gitlab.blaschke.science/nersc/julia/-/tree/main

For users

HPC systems with Julia support

The following is an (incomplete) list of HPC systems that provide a Julia installation and/or support for using Julia to its users:

Center	System	Installation	Support	Interactive	Architecture	Accelerators	Documentation
ARC, UCL	Myriad, Kathleen, Michael, Young	✅	✅	?	various Intel Xeon	various GPUs	1
CSCS	Piz Daint	✅	✅	✅	Intel Xeon Broadwell + Haswell	Nvidia Tesla P100	1
DESY IT	Maxwell	✅	?	✅	various AMD EPYC/Intel Xeon	various GPUs	1
FASRC, Harvard U	Cannon	✅	?	✅	Intel Xeon Cascade Lake	Nvidia V100, A100	1
HLRS	Hawk	✅	✅	✅	AMD EPYC Rome	Nvidia Tesla A100	1
HPC @ LLNL	various systems	✅	?	✅	various processors	various GPUs	1
HPC2N, Umeå U	Kebnekaise	✅	✅	?	Intel Xeon Broadwell + Skylake	Nvidia Tesla K80, Nvidia Tesla V100	1
NERSC	Cori	✅	?	?	Intel Xeon Haswell	Intel Xeon Phi	1
NERSC	Perlmutter	✅	✅	?	AMD EPYC Milan	Nvidia Ampere A100	1, 2
NeSI	Mahuika, Māui	✅	✅	✅	Intel Xeon Broadwell/Cascade Lake + AMD EPYC Milan	Nvidia Tesla P100, A100	1
PC2, U Paderborn	Noctua 1	✅	✅	✅	Intel Xeon Skylake	Intel Stratix 10 + consumer GPUs	1
PC2, U Paderborn	Noctua 2	✅	✅	✅	AMD EPYC Milan	Nvidia Ampere A100, Xilinx Alveo U280	1
ULHPC, U Luxembourg	Aion, Iris	✅	?	✅	AMD EPYC Rome + Intel Xeon Broadwell/Skylake	Nvidia Tesla V100	1
ZDV, U Mainz	MOGON II	✅	?	?	Intel Xeon Broadwell + Skylake	no	1

Nomenclature

Center: The HPC center's name
System: The compute system's "marketing" name
Installation: Is there a pre-installed Julia configuration available?
Support: Is Julia "officially" supported on the system, i.e., will Julia users be supported by HPC center staff if they have questions/problems?
Interactive: Is interactive computing with Julia supported, i.e., can you run parallel jobs on the system interactively via, e.g., Jupyter notebooks?
Architecture: The main CPU used in the system
Accelerators: The main accelerator (if anything) in the system
Documentation: Links to documentation for Julia users

Other HPC systems

There are a number of other HPC systems that have been reported to provide a Julia installation and/or Julia support, but lack enough details to be put on the list above:

Arjuna cluster at CMU
Various clusters at ANL

License and contributing

The contents of this repository are published under the MIT license (see LICENSE). Our main goal is to publicly curate information on using Julia on HPC systems, as a service from the community and for the community. Therefore, we are very happy to accept contributions from everyone, preferably in the form of a PR.

Authors

This repository is maintained by Michael Schlottke-Lakemper (University of Stuttgart, Germany).

The following people have provided valuable contributions, either in the form of PRs or via private communication:

Johannes Blaschke (@jblaschke)
Carsten Bauer (@carstenbauer)
Valentin Churavy (@vchuravy)
Mosè Giordano (@giordano)
Ludovic Räss (@luraess)
Pedro Ojeda (@pojeda)
Samuel Omlin (@omlins)
Dinindu Senanayake (@DininduSenanayake)

Disclaimer

Everything is provided as is and without warranty. Use at your own risk!

abillscmu / julia-on-hpc-systems Goto Github PK