GithubHelp home page GithubHelp logo

abillscmu / julia-on-hpc-systems Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hlrs-tasc/julia-on-hpc-systems

0.0 1.0 0.0 70 KB

Information on how to set up Julia on HPC systems

License: MIT License

julia-on-hpc-systems's Introduction

Julia on HPC systems

The purpose of this repository is to document best practices for running Julia on HPC systems (i.e., "supercomputers"). At the moment, both information relevant for supercomputer operators as well as users is collected here. There is no guarantee for permanence or that information here is up-to-date, neither for a useful ordering and/or categorization of issues.

For operators

Official Julia binaries vs. building from source

According to this Discourse post, the difference between compiling Julia from source with architecture-specific optimization and using the official Julia binaries is negligible. This has been confirmed by Ludovic Räss for an Nvidia DGX-1 system at CSCS, where also no performance differences between a Spack-installed version and the official binaries were found (April 2022).

Since installing from source using, e.g., Spack, can sometimes be cumbersome, the general recommendation is to go with the pre-built binaries unless benchmarked and found to be different. This is also the current approach taken at NERSC, CSCS, and PC2.

In June 2022, a new Julia PR was created (JuliaLang/julia#45641) that aims to add PGO (profile-guided optimization) and LTO (link-time optimization) to the Julia Makefile. Depending on the test, compilation time improvements of up to 30% have been reported, so it might be worth checking out once merged. The performance of the compiled Julia code is unaffected though.

Last update: June 2022

Ensure correct libraries are loaded

When using Julia on a system that uses an environment-variable based module system (such as modules or Lmod), the LD_LIBRARY_PATH variable might be filled with entries pointing to different packages and libraries. To avoid issues from Julia loading another library instead of the ones packaged with Julia, make sure that Julia's lib directory is always the first directory in LD_LIBRARY_PATH.

One possibility to achieve this is to create a wrapper shell script that modifies LD_LIBRARY_PATH before calling the Julia executable. Inspired by a script from UCL's Owain Kenway:

#!/usr/bin/env bash

# This wrapper makes sure the julia binary distributions picks up the GCC
# libraries provided with it correctly meaning that it does not rely on
# the gcc-libs version.

# Dr Owain Kenway, 20th of July, 2021
# Source: https://github.com/UCL-RITS/rcps-buildscripts/blob/04b2e2ccfe7e195fd0396b572e9f8ff426b37f0e/files/julia/julia.sh

location=$(readlink -f $0)
directory=$(readlink -f $(dirname ${location})/..)

export LD_LIBRARY_PATH=${directory}/lib/julia:${LD_LIBRARY_PATH}
exec ${directory}/bin/julia "$@"

Note that using readlink might not be optimal from a performance perspective if used in a massively parallel environment. Alternatively, hard-code the Julia path or set an environment variable accordingly.

Also note that fixing the LD_LIBRARY_PATH variable does not seem to be a hard requirement, since it is not used universally (e.g., it is not necessary on NERSC's systems).

Last update: April 2022

Julia depot path

Since the available file systems can differ significantly between HPC centers, it is hard to make a general statement about where the Julia depot folder (by default on Unix-like systems: ~/.julia) should be placed (via JULIA_DEPOT_PATH). Generally speaking, the file system hosting the Julia depot should have

  • good (parallel) I/O
  • no tight quotas
  • read and write access
  • no mechanism for the automatic deletion of unused files (or the depot should be excluded as an exception)

On some systems, it resides in the user's home directory (e.g. at NERSC). On other systems, it is put on a parallel scratch file system (e.g. CSCS and PC2). At the time of writing (April 2022), there does not seem to be reliable performance data available that could help to make a data-based decision.

If multiple platforms, e.g., systems with different architecture, would access the same Julia depot, for example because the file system is shared, it might make sense to create platform-dependend Julia depots by setting the JULIA_DEPOT_PATH environment variable appropriately, e.g.,

prepend-path JULIA_DEPOT_PATH $env(HOME)/.julia/$platform

where $platform contains the current system name (source).

MPI.jl

It is generally recommended to set

JULIA_MPI_BINARY=system

such that MPI.jl will always use a system MPI instead of the Julia artifact (i.e. MPI_jll.jl). For more configuration options see this part of the MPI.jl documentation.

Additionally, on the NERSC systems, there is a pre-built MPI.jl for each programming environment, which is loaded through a settings module. More information on the NERSC module file setup can be found here.

CUDA.jl

It seems to be generally advisable to set the environment variables

JULIA_CUDA_USE_BINARYBUILDER=false
JULIA_CUDA_USE_MEMORY_POOL=none

in the module files when loading Julia on a system with GPUs. Otherwise, Julia will try to download its own BinaryBuilder.jl-provided CUDA stack, which is typically not what you want on a production HPC system. Instead, you should make sure that Julia finds the local CUDA installation by setting relevant environment variables (see also the CUDA.jl docs). Disabling the memory pool is advisable to make CUDA-aware MPI work on multi-GPU nodes (see also the MPI.jl docs).

Modules file setup

Johannes Blaschke provides scripts and templates to set up modules file for Julia on some of NERSC's systems:
https://gitlab.blaschke.science/nersc/julia/-/tree/main/modulefiles

There are a number of environment variables that should be considered to be set through the module mechanism:

Easybuild resources

Samuel Omlin and colleagues from CSCS provide their Easybuild configuration files used for Piz Daint online at https://github.com/eth-cscs/production/tree/master/easybuild/easyconfigs/j/Julia. For example, there are configurations available for Julia 1.7.2 and for Julia 1.7.2 with CUDA support. Looking at these files also helps to decide which kind of environment variables are useful to set.

Further resources

For users

HPC systems with Julia support

The following is an (incomplete) list of HPC systems that provide a Julia installation and/or support for using Julia to its users:

Center System Installation Support Interactive Architecture Accelerators Documentation
ARC, UCL Myriad, Kathleen, Michael, Young ? various Intel Xeon various GPUs 1
CSCS Piz Daint Intel Xeon Broadwell + Haswell Nvidia Tesla P100 1
DESY IT Maxwell ? various AMD EPYC/Intel Xeon various GPUs 1
FASRC, Harvard U Cannon ? Intel Xeon Cascade Lake Nvidia V100, A100 1
HLRS Hawk AMD EPYC Rome Nvidia Tesla A100 1
HPC @ LLNL various systems ? various processors various GPUs 1
HPC2N, Umeå U Kebnekaise ? Intel Xeon Broadwell + Skylake Nvidia Tesla K80, Nvidia Tesla V100 1
NERSC Cori ? ? Intel Xeon Haswell Intel Xeon Phi 1
NERSC Perlmutter ? AMD EPYC Milan Nvidia Ampere A100 1, 2
NeSI Mahuika, Māui Intel Xeon Broadwell/Cascade Lake + AMD EPYC Milan Nvidia Tesla P100, A100 1
PC2, U Paderborn Noctua 1 Intel Xeon Skylake Intel Stratix 10 + consumer GPUs 1
PC2, U Paderborn Noctua 2 AMD EPYC Milan Nvidia Ampere A100, Xilinx Alveo U280 1
ULHPC, U Luxembourg Aion, Iris ? AMD EPYC Rome + Intel Xeon Broadwell/Skylake Nvidia Tesla V100 1
ZDV, U Mainz MOGON II ? ? Intel Xeon Broadwell + Skylake no 1

Nomenclature

  • Center: The HPC center's name
  • System: The compute system's "marketing" name
  • Installation: Is there a pre-installed Julia configuration available?
  • Support: Is Julia "officially" supported on the system, i.e., will Julia users be supported by HPC center staff if they have questions/problems?
  • Interactive: Is interactive computing with Julia supported, i.e., can you run parallel jobs on the system interactively via, e.g., Jupyter notebooks?
  • Architecture: The main CPU used in the system
  • Accelerators: The main accelerator (if anything) in the system
  • Documentation: Links to documentation for Julia users

Other HPC systems

There are a number of other HPC systems that have been reported to provide a Julia installation and/or Julia support, but lack enough details to be put on the list above:

  • Arjuna cluster at CMU
  • Various clusters at ANL

License and contributing

The contents of this repository are published under the MIT license (see LICENSE). Our main goal is to publicly curate information on using Julia on HPC systems, as a service from the community and for the community. Therefore, we are very happy to accept contributions from everyone, preferably in the form of a PR.

Authors

This repository is maintained by Michael Schlottke-Lakemper (University of Stuttgart, Germany).

The following people have provided valuable contributions, either in the form of PRs or via private communication:

Disclaimer

Everything is provided as is and without warranty. Use at your own risk!

julia-on-hpc-systems's People

Contributors

carstenbauer avatar dinindusenanayake avatar giordano avatar pojeda avatar ranocha avatar sloede avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.