GithubHelp home page GithubHelp logo

isc24-ahug-workshop's Introduction

Arm HPC User Group Workshop @ ISC24

The 2024 Arm HPC User Group (AHUG) Workshop is held in conjunction with ISC High Performance 2024 in Hamburg, Germany.

Date & Time: May 16th, 2024 @ 9:00am - 1:00pm
Location: Hall Y10 - 2nd floor, Congress Center Hamburg (CCH), Germany

Join the AHUG Slack channel!

Timetable and Agenda

Time Duration Title Speaker Affiliation
09:00-09:10 10m **Welcome Remarks & Plenary Address to the AHUG Community** Filippo Spiga AHUG
09:10-09:45 35m Invited Talk
**Isambard-3 and Isambard-AI**
Simon McIntosh-Smith University of Bristol / GW4
09:45-10:10 25m **Performance analyses of benchmark applications on different A64FX architectures** Seydou Ba RIKEN R-CCS
10:10-10:35 25m **NVIDIA Grace Superchip Early Evaluation for HPC Applications** Fabio Banchelli BSC
10:35-11:00 25m **Accelerating Hierarchical Collective Communication on next-gen ARM architectures** Alon Zameret Toga Networks - Huawe
11:00-11:30 30m _Coffee break_
11:30-11:55 25m **Running Arm Accelerated Solutions for Engineering Workflows in the Cloud with Rescale** Sam Zakrzewski Rescale
11:55-12:20 25m **Extending Arm's Reach by Going EESSI** Kenneth Hoste Ghent University
12:20-12:40 20m **NVIDIA Grace Superchip** Filippo Spiga NVIDIA
12:40-13:00 20m **EDF R&D Code_Saturne performance on AWS HPC7g instance** Conrad Hillairet Arm Ltd & AWS

Abstracts

Performance analyses of benchmark applications on different A64FX architectures

Speaker: Seydou Ba (RIKEN R-CCS)
Modern supercomputers are increasingly complex and are utilizing faster and denser core count processors per node. This work focuses on the performance comparison of different node architectures of Fujitsu's A64FX, namely the FX1000 (Fugaku) and two versions of the FX700. The architectures differ mainly in that, the FX1000 has 48cores per node and uses the TofuD interconnect, while the FX700s use Infiniband and one version has 48cores per node and the other has 24cores per node at a higher frequency. We are monitoring performance with profilers and analysis tools to conduct detailed performance studies with key benchmark applications. Furthermore, we aim to expend this study with the purpose of analyzing hardware option for interconnections leaning toward photonics design for next-gen supercomputers.

NVIDIA Grace Superchip Early Evaluation for HPC Applications

Speaker: Fabio Banchelli (BSC)
Arm-based system in HPC are a reality since more than a decade. However, when a new chip enters the market always implies challenges, not only at ISA level, but also with regards to the SoC integration, the memory subsystem, the board integration, the node interconnection, and finally the OS and all layers of the system software (compiler and libraries). Guided by the procurement of an NVIDIA Grace HPC cluster within the deployment of MareNostrum 5, and emulating the approach of a scientist who needs to migrate its scientific research to a new HPC system, we evaluated five complex scientific applications on engineering sample nodes of NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip (CPU-only). We report intra-node and inter-node scalability and early performance results showing a speed-up between 1.3x and 4.28x for all codes when compared to the current generation of MareNostrum 4 powered by Intel Skylake CPUs.

Accelerating Hierarchical Collective Communication on next-gen ARM architectures

Speaker: Alon Zameret (Toga Networks - Huawei)
In large-scale distributed computing, collective communication often poses a bottleneck due to the latency and bandwidth limitations of modern networks. In this work we propose an innovative hierarchical approach to accelerate collective communication, based on multiple levels of communication aggregation (intra-node, inter-node, inter-rack). Assigning multiple representatives at each level enables communication and data partitioning for alleviating latency/bandwidth bounds in HPC and AI applications. Leveraging the SIMD extensions (e.g. SVE) offered in next-generation ARM architectures further improves performance by optimizing memory copy and reduction operations. The resulting communication component, MLMR, demonstrates up to 5x reduction of AllReduce collective communication on an ARM cluster with more than 12k cores. Further work is underway to introduce payload pipelining in order to overlap inter-level communication, and enhance performance.

Running Arm Accelerated Solutions for Engineering Workflows in the Cloud with Rescale

Speaker: Sam Zakrzewski (Rescale)
As computational demands in engineering continue to rise, leveraging cloud computing resources becomes increasingly imperative. This presentation delves into the optimization and efficiency gains achieved by running engineering workflows in the cloud on Arm-based hardware through Rescale's platform. Case studies and demonstrations illustrate how Rescale's platform enables seamless migration of engineering workflows to Arm-based cloud instances, unlocking unparalleled performance and cost-effectiveness. Key topics covered include the technical considerations of transitioning to Arm architecture, performance benchmarks and cost comparisons illustrating the economic benefits of cloud-based Arm computing. Attendees will gain a comprehensive understanding of how adopting Arm accelerated solutions for engineering workflows on Rescale's platform empowers organizations to tackle complex simulations with unprecedented efficiency, driving innovation and competitiveness.

Extending Arm's Reach by Going EESSI

Speaker: Kenneth Hoste (University of Genth)
In the European Environment for Scientific Software Installation (EESSI) community project (https://eessi.io), we provide a stack of optimized scientific software installations that work on any Linux system, regardless of whether it is powered by Intel, AMD, or Arm CPUs (soon also RISC-V). This effort is currently funded through the EuroHPC Centre-of-Excellence MultiXscale (https://multixscale.eu). In this talk, we will share our experiences with building a wide range of scientific applications, libraries, and required dependencies for different Arm microarchitectures. We encountered (and fixed) various problems along the way, especially when targetting Arm Neoverse V1, and when running software test suites. Additionally, we will demonstrate how you can get access in a matter of minutes to a rich set of optimized software installations for Arm systems, including Raspberry PI's, cloud instances powered by an Arm CPU, and Arm-based EuroHPC supercomputers like Deucalion and JUPITER.

isc24-ahug-workshop's People

Contributors

fspiga avatar miwakotsuji avatar

Stargazers

Kenneth Hoste avatar Suraj avatar

Watchers

Valerio Schiavoni avatar Simon McIntosh-Smith avatar Jeffrey Young avatar Julius Plehn avatar Anibal avatar Suraj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.