GithubHelp home page GithubHelp logo

Comments (13)

jithinjosepkl avatar jithinjosepkl commented on September 5, 2024

AN (Accelerated networking) is not available for HC. Please set "accelerated_networking": false for HC

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

@jithinjosepkl in the config file i shared? Will that fix the interconnect issue?

from azurehpc.

jithinjosepkl avatar jithinjosepkl commented on September 5, 2024

Please follow this article to make IMPI pick the mlx provider.

Next update of CentOS HPC images will include IMPI 2019-U8, where you don't have to specify this environment parameter (picks up mlx provider by default).

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

@jithinjosepkl I would appreciate a working scenario as the cost is getting very high without us getting a job ran correctly.
I'm interested in Standard_HC44rs cluster with Mellanox and Gen2 VM

surprisingly the OpenLogic:CentOS:7_7-gen2:latest image didn't come with any IMPI installed (It sounds like it get overridden maybe via the azurehpc scripts?). So i had to install IMPI manually .

When i setup the Mlx, i get an error because the interconnect is not installed/setup correctly:

[hpcadmin@headnode lammps-avx512]$ export FI_PROVIDER=mlx
[hpcadmin@headnode lammps-avx512]$ mpirun -n 2 -ppn 44 /opt/intel/psxe_runtime/linux/mpi/intel64/bin/IMB-MPI1 pingpong
[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 8  Build 20200624 (id: 4f16ad915)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1-impi
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:No data available)

from azurehpc.

garvct avatar garvct commented on September 5, 2024

The CentOS images do not contain any pre-installed OFED drivers and mpi libraries (including Intel MPI), try using the CentOS-HPC images instead (e.g OpenLogic:CentOS-HPC:7_7-gen2:latest). The CentOS-HPC images should be ready to go if you want to use Infiniband on HC44 skus.

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

@garvct I'm already using OpenLogic:CentOS-HPC:7_7-gen2:latest image. Please see my original post

from azurehpc.

jithinjosepkl avatar jithinjosepkl commented on September 5, 2024

@Smahane , based on your config file, you are using OpenLogic:CentOS:7_7-gen2:latest.
"hpc_image": "OpenLogic:CentOS:7_7-gen2:latest",

You need OpenLogic:CentOS-HPC:7_7-gen2:latest image instead for the MPIs to be pre-installed.

from azurehpc.

garvct avatar garvct commented on September 5, 2024

Once you sort out your image, you can execute the IMB-MPI1 benchmark using the scripts in azurehpc/apps/imb-mpi. Examples of running IMB-MPI1 with different MPI libraries are provided.

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

@garvct and @jithinjosepkl thank you for pointing me out of this HPC image.

  • Should headnode have this image too?
  • Should I still set "accelerated_networking": false" ? And what does it mean?
  • any instructions of how to add my own post installation script to the config file?

from azurehpc.

xpillons avatar xpillons commented on September 5, 2024

@Smahane

  • The Headnode can have the HPC image too
  • Accelerated Networking is offloading TCP and will boost the frontend NIC, but is not available on all VM SKUs. See the public documentation here. Accelerated Networking doesn't apply to the Infiniband NIC and today is not supported on our HPC VM SKUs yet.
  • You can add your own scripts by
    • Adding your scripts in the scripts directory where your config file is stored
    • Add a custom tag on the resources you want your scripts to be applied
    • In the install array add a section for each script you want to be applied, specify if sudo is required, and add params and deps files if needed

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

This worked. Thank you everyone

from azurehpc.

Smahane avatar Smahane commented on September 5, 2024

Hello @xpillons @garvct and @jithinjosepkl . I ran LAMMPS on up to 8 nodes Standard_HC44rs but I'm having performance issues at 8 nodes:

image

I think one or more nodes are bad. Do you know of any smoke test or script that can help me check the nodes and detect which one is not working well?

the azhpc_install_config/install/11_node_healthchecks.log doesn't show any errors.

Thank you,

from azurehpc.

xpillons avatar xpillons commented on September 5, 2024

@Smahane you can use the MPI PingPong test, we have an example here https://github.com/Azure/azurehpc/tree/master/apps/imb-mpi

from azurehpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.