GithubHelp home page GithubHelp logo

dufanxin / gloo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookincubator/gloo

0.0 1.0 0.0 625 KB

Collective communications library with various primitives for multi-machine training.

License: Other

Shell 0.30% CMake 5.22% C++ 88.75% Cuda 5.73%

gloo's Introduction

Gloo

Build Status

Gloo is a collective communications library. It comes with a number of collective algorithms useful for machine learning applications. These include a barrier, broadcast, and allreduce.

Transport of data between participating machines is abstracted so that IP can be used at all times, or InifiniBand (or RoCE) when available. In the latter case, if the InfiniBand transport is used, GPUDirect can be used to accelerate cross machine GPU-to-GPU memory transfers.

Where applicable, algorithms have an implementation that works with system memory buffers, and one that works with NVIDIA GPU memory buffers. In the latter case, it is not necessary to copy memory between host and device; this is taken care of by the algorithm implementations.

Requirements

Gloo is built to run on Linux and has no hard dependencies other than libstdc++. That said, it will generally only be useful when used in combination with a few optional dependencies below.

Optional dependencies are:

  • CUDA and NCCL -- for CUDA aware algorithms, tests, and benchmark
  • Google Test -- to build and run tests
  • Hiredis -- for coordinating machine rendezvous through Redis
  • MPI -- for coordinating machine rendezvous through MPI

Documentation

Please refer to docs/ for detailed documentation.

Building

You can build Gloo using CMake.

Since it is a library, it is most convenient to vendor it in your own project and include the project root in your own CMake configuration.

Test

Building the tests requires Google Test version 1.8 or higher. On Ubuntu, this version ships with version 17.10 and up. If you run an older version, you'll have to install Google Test yourself, and set the GTEST_ROOT CMake variable.

To build the tests, run:

mkdir -p build
cd build
cmake ../ -DBUILD_TEST=1 -DGTEST_ROOT=/some/path (if using custom install)
make
ls -l gloo/test/gloo_test*

To test the CUDA algorithms, specify USE_CUDA=ON as well, and the CUDA tests are built at gloo/test/gloo_test_cuda.

Benchmark

First install the dependencies required by the benchmark tool. On Ubuntu, you can do so by running:

sudo apt-get install -y libhiredis-dev

Then build the benchmark, run:

mkdir build
cd build
cmake ../ -DBUILD_BENCHMARK=1
ls -l gloo/benchmark/benchmark

Benchmarking

The benchmark tool depends on Redis/Hiredis for rendezvous. The benchmark tool for CUDA algorithms obviously also depends on both CUDA and NCCL.

To run a benchmark:

  1. Copy the benchmark tool to all participating machines

  2. Start a Redis server on any host (either a client machine or one of the machines participating in the test). Note that Redis Cluster is not supported.

  3. Determine some unique ID for the benchmark run (e.g. the uuid tool or some number).

  4. On each machine, run (or pass --help for more options):

    ./benchmark \
      --size <number of machines> \
      --rank <index of this machine, starting at 0> \
      --redis-host <Redis host> \
      --redis-port <Redis port> \
      --prefix <unique identifier for this run> \
      --transport tcp \
      --elements <number of elements; -1 for a sweep> \
      --iteration-time 1s \
      allreduce_ring_chunked
    

Example output (running on 4 machines with a 40GbE network):

   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples
          1        195        263        342        437       3921
          2        195        261        346        462       4039
          5        197        261        339        402       3963
         10        197        263        338        398       3749
         20        199        268        343        395       4146
         50        200        265        344        401       3889
        100        205        265        351        414       3645
        200        197        264        328        387       3960
        500        201        264        329        394       4274
       1000        200        267        330        380       3344
       2000        205        263        323        395       3682
       5000        240        335        424        460       3277
      10000        271        346        402        457       2721
      20000        283        358        392        428       2719
      50000        342        438        495        649       1654
     100000        413        487        669        799       1687
     200000       1113       1450       1837       2801        669
     500000       1099       1294       1665       1959        560
    1000000       1858       2286       2779       6100        320
    2000000       3546       3993       4364       4886        252
    5000000      10030      10608      11106      11628         92

License

Gloo is BSD-licensed. We also provide an additional patent grant.

gloo's People

Contributors

ajschumacher avatar andrewjcg avatar andrewwdye avatar andriigrynenko avatar apaszke avatar brettkoonce avatar gchanan avatar hgaiser avatar igorsugak avatar kirteshpatil avatar lukeyeager avatar manojkris avatar myleott avatar orionr avatar pietern avatar plapukhov avatar rainwoodman avatar rathir avatar slayton58 avatar soumith avatar teng-li avatar vgao1996 avatar virrages avatar wesolwsk avatar xw285cornell avatar yangqing avatar yfeldblum avatar zicky avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.