GithubHelp home page GithubHelp logo

isabella232 / msccl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/msccl

0.0 0.0 0.0 2.66 MB

Microsoft Collective Communication Library

License: Other

Shell 0.25% C++ 66.02% C 30.84% Cuda 0.67% Makefile 1.86% Dockerfile 0.36%

msccl's Introduction

MSCCL

Microsoft Collective Communication Library (MSCCL) is a platform to execute custom collective communication algorithms for multiple accelerators supported by Microsoft Azure.

Introduction

MSCCL is an inter-accelerator communication framework that is built on top of NCCL and uses its building blocks to execute custom-written collective communication algorithms. MSCCL vision is to provide a unified, efficient, and scalable framework for executing collective communication algorithms across multiple accelerators. To achieve this, MSCCL has multiple capabilities:

  • Programmibility: Inter-connection among accelerators have different latencies and bandwidths. Therefore, a generic collective communication algorithm does not necessarily well for all topologies and buffer sizes. MSCCL allows a user to write a hyper-optimized collective communication algorithm for a given topology and a buffer size. This is possbile through two main components: MSCCL toolkit and MSCCL runtime (this repo). MSCCL toolkit contains a high-level DSL (MSCCLang) and a compiler which generate an IR for the MSCCL runtime (this repo) to run on the backend. MSCCL will always MSCCL will automatically fall back to a NCCL's generic algorithm in case there is no custom algorithm. Example provides some instances on how MSCCL toolkit with the runtime works. Please refer to MSCCL toolkit for more information.
  • Profiling: MSCCL has a profiling tool NPKit which provides detailed timeline for each primitive send and receive operation to understand the bottlenecks in a given collective communication algorithms.

MSCCL is the product of many great researchers and interns at Microsoft Research. Below is a list of our publications:

Please consider citing our work if you use MSCCL in your research. Also, please contact us if you have any questions or need an optimized collective communication algorithm for a specific topology.

Example

In order to use MSCCL customized algorithms, you may follow these steps to use two different MSCCL algorithms for AllReduce on Azure NDv4 which has 8xA100 GPUs:

Steps to install MSCCL:

$ git clone https://github.com/microsoft/msccl.git
$ cd msccl/
$ make -j src.build
$ cd ../

Then, follow these steps to install nccl-tests for performance evaluation:

$ git clone https://github.com/nvidia/nccl-tests.git
$ cd nccl-tests/
$ make MPI=1 NCCL_HOME=../msccl/build/ -j 
$ cd ../

Next install MSCCL toolkit to compile a few custom algorithms:

$ git clone https://github.com/microsoft/msccl-tools.git
$ cd msccl-tools/
$ pip install .
$ cd ../
$ python msccl-tools/examples/mscclang/allreduce_a100_allpairs.py --protocol=LL 8 2 > test.xml
$ cd ../

The compiler's generated code is an XML file (test.xml) that is fed to MSCCL runtime. To evaluate its performance, execute the following command line on an Azure NDv4 node or any 8xA100 system:

$ mpirun -np 8 -x LD_LIBRARY_PATH=msccl/build/lib/:$LD_LIBRARY_PATH -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,ENV -x MSCCL_XML_FILES=test.xml -x NCCL_ALGO=MSCCL,RING,TREE  nccl-tests/build/all_reduce_perf -b 128 -e 32MB -f 2 -g 1 -c 1 -n 1000 -w 1000 -z 0

If everything is installed correctly, you should see the following output in log:

[0] NCCL INFO Connected 1 MSCCL algorithms

test.xml is passed in to the runtime by MSCCL_XML_FILES in the command line. You may evaluate the performance of test.xml by comparing in-place (the new algorithm) vs out-of-place (default ring algorithm) and it should up-to 2x faster on 8xA100 NVLink-interconnected GPUs. MSCCL toolkit has a rich set of algorithms for different Azure SKUs and collective operations with significant speedups over vanilla NCCL.

Build

To build the library:

$ cd msccl
$ make -j src.build

If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with :

$ make src.build CUDA_HOME=<path to cuda install>

MSCCL will be compiled and installed in build/ unless BUILDDIR is set.

By default, MSCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining NVCC_GENCODE (defined in makefiles/common.mk) to only include the architecture of the target platform :

$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"

Install

To install MSCCL on the system, create a package then install it as root.

Debian/Ubuntu :

$ # Install tools to create debian packages
$ sudo apt install build-essential devscripts debhelper fakeroot
$ # Build NCCL deb package
$ make pkg.debian.build
$ ls build/pkg/deb/

RedHat/CentOS :

$ # Install tools to create rpm packages
$ sudo yum install rpm-build rpmdevtools
$ # Build NCCL rpm package
$ make pkg.redhat.build
$ ls build/pkg/rpm/

OS-agnostic tarball :

$ make pkg.txz.build
$ ls build/pkg/txz/

PyTorch Integration

For integration with PyTorch, follow the dockerfile in this repo. It has an example for how to replace default NCCL with MSCCL.

Copyright

All source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.

All modifications are copyright (c) 2020-2022, Microsoft Corporation. All rights reserved.

msccl's People

Contributors

sjeaugey avatar saeedmaleki avatar borisfom avatar nluehr avatar addyladdy avatar kwen2501 avatar chsigg avatar lukeyeager avatar olsaarik avatar kylefernandes avatar lowintelligence avatar jia-kai avatar abuccts avatar obihoernchen avatar peterhj avatar rajatchopra avatar rashikakheria avatar riatre avatar rongou avatar slayton58 avatar yzygitzh avatar aokomoriuta avatar microsoft-github-policy-service[bot] avatar sclarkson avatar madanmus avatar jonaszhou1 avatar ilya-biryukov avatar drpnd avatar sl1pkn07 avatar flx42 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.