GithubHelp home page GithubHelp logo

crownk1997 / coconet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from parasailteam/coconet

0.0 0.0 0.0 37.22 MB

License: MIT License

C++ 85.24% Makefile 1.47% Python 3.94% C 8.87% Shell 0.15% Cuda 0.33%

coconet's Introduction

CoCoNet

CoCoNet is a DSL and a compiler to co-optimize computation and communication for distributed machine learning workloads. CoCoNet exposes optimizations across the abstraction boundary of communication and computation routines. For more details please refer to the paper https://arxiv.org/abs/2105.05720 .

Directory Structure

  • examples/
    • sgd/ contains Parameter Update using Stochastic Gradient Descent (SGD) optimizer
    • adam/ contains Parameter Update using Adam Optimizer
    • lamb/ contains Parameter Update using LAMB Optimizer
  • src/ contains the source of code CoCoNet
  • nccl-src/ contains modified NCCL source cloned from https://github.com/NVIDIA/nccl/

CoCoNet Example

The key parts of a distributed machine learning program can be written and optimized in CoCoNet with only few lines of code. This section uses the SGD example in examples/sgd/sgd.cpp. First we declare the input tensors in our program. Each tensor can have one of three layouts:

  1. Local: A Local tensor is present on all nodes of distributed system and each tensor contains different elements.
  2. Replicated: A Replicated tensor is present on all nodes of distributed system and but tensor contains same elements on nodes.
  3. Sliced: A Sliced tensor is divided on all nodes of distributed system and ith node contains ith part of the tensor.
Variable N(Int32, "N");    //Length of g and w
Variable lr(Float32, "lr");  //Learning rate
Tensor g(Float32, N, Local, "g"); //Gradient
Tensor w(Float32, N, Replicated, "w"); //Weights (parameters)

Now we perform the operations on the tensors to produce new tensors. CoCoNet supports communication operations and computation operations. In SGD, first do an AllReduce and then parameter update.

Stage g1 = AllReduce(Summation, g);
Stage w1 = Update(w, w - lr * g1);

Finally create the program and generate code.

Pipeline pipeline({g,w,lr}, {w1});

pipeline.codegen("sgd-ar-c.cu");

Prerequisites

Linux Installation: We recommend using Ubuntu 20.04 as the Linux OS.

Install Dependencies: Following instructions must be executed on all nodes of the distributed system.

Execute following commands to install dependencies.

sudo apt update && sudo apt install gcc linux-headers-$(uname -r) make g++ git python3-dev wget unzip python3-pip cmake openmpi* libopenmpi* *glog*

Install CUDA: In our experiments we used CUDA 11.3 on Ubuntu 20.04. We cannot use CUDA 11.5 because currently PyTorch only supports upto 11.3. CUDA 11.3 toolkit can be downloaded from https://developer.nvidia.com/cuda-11.3.0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=runfile_local While installing CUDA, install CUDA in /usr/local/cuda and install the CUDA samples in the home directory.

Check CUDA Installation: To check CUDA installation, go to CUDA samples installed in your home directory and execute following commands:

cd ~/NVIDIA_CUDA-11.3_Samples/1_Utilities/deviceQuery
make
./deviceQuery

Executing this deviceQuery command will show the information about GPUs on the node. If there is any error then either CUDA device is not present or CUDA driver is not installed correctly.

Set NVCC Path and CUDA libraries path: We assume that nvcc is present in /usr/local/cuda/bin/nvcc. Please make sure that this is a valid path and this nvcc is from CUDA 11.3 by using nvcc --version. Then export this in your PATH variable.

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

Install Anaconda We will use Anaconda to download and build PyTorch. You can download Anaconda from linux by going to https://www.anaconda.com/products/individual#linux . We recommend the bash file available at https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh . After download install Anaconda by

sh ./Anaconda3-2021.11-Linux-x86_64.sh

Restart the shell and activate the Anaconda environment by

source /path/to/conda/bin/activate 

This activation is needed after every login to shell. Anaconda installs mpich but we need to remove it.

conda remove mpich 

Install matplotlib by

conda install matplotlib

Install PyTorch Install PyTorch using pip for CUDA 11.3. The instructions are available at https://pytorch.org/get-started/locally/

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Verify PyTorch and CUDA Installation by executing following text in python:

import torch
torch.cuda.is_available()

This should return True, otherwise please check your PyTorch and CUDA installations.

Install NVIDIA Apex Optimizer NVIDIA Apex (https://github.com/NVIDIA/apex) is one of the baselines. Install it by following the Quick Start guide in README.md of apex repository. Following commands are copied from there.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Building Examples

examples/ directory contains several examples. To run an example, say sgd, do

cd examples/sgd
make 

This will generate three programs for SGD:

  1. sgd-ar-c: that performs an AllReduce and a weight update.
  2. sgd-rs-c-ag: that performs a ReduceScatter, perform sliced weight updates, and then an AllGather to gather all updated weights.
  3. sgd-fuse-rs-c-ag: that fuses all operations in previous program in a single FusedAllReduce call.

Experiments

Running experiments requires setting two environment variables:

  • NPROC to specify the number of processes to invoke
  • MPI_ARGS to specify the any arguments to MPI. Testing script uses torch.distributed that requires specifying MASTER_ADDR and MASTER_PORT arguments. A hostfile can also be passed through MPI_ARGS.

Data Parallel Experiments To run experiments invoke experiments/data-parallel-exp.py. The script requires an argument: a directory to store results. For example, to run experiments with 4 processes, with MASTER_ADDR to 127.0.0.1, MASTER_PORT to 10000, and store results to experiments/results.

cd experiments
export NPROC=4
export MPI_ARGS="-x MASTER_ADDR=127.0.0.1 -x MASTER_PORT=10000"
python data-parallel-exp.py results/

Generate graphs in PDF format by running experiments/gen-data-parallel-graphs.py and provide the results directory.

python gen-data-parallel-graphs.py results/

Graphs with following names are stored in experiments directory:

  • Figure10a.pdf: Shows the results for Adam optimizer
  • Figure10b.pdf: Shows the results for LAMB optimizer

Model Parallel Experiments To run experiments invoke experiments/model-parallel-exp.py. The script requires an argument: a directory to store results. For example, to run experiments with 4 processes, with MASTER_ADDR to 127.0.0.1, MASTER_PORT to 10000, and store results to experiments/results.

cd experiments
export NPROC=4
export MPI_ARGS="-x MASTER_ADDR=127.0.0.1 -x MASTER_PORT=10000"
python model-parallel-exp.py results/

Generate graphs in PDF format by running experiments/gen-model-parallel-graphs.py and provide the results directory.

python gen-model-parallel-graphs.py results/

Graphs with Figure11.pdf is stored in experiments directory.

Pipeline Parallel Experiments To run experiments invoke experiments/pipeline-parallel-exp.py. The script requires an argument: a directory to store results. For example, to run experiments with 4 processes, with MASTER_ADDR to 127.0.0.1, MASTER_PORT to 10000, and store results to experiments/results.

cd experiments
export NPROC=4
export MPI_ARGS="-x MASTER_ADDR=127.0.0.1 -x MASTER_PORT=10000"
python pipeline-parallel-exp.py results/

Generate graphs in PDF format by running experiments/gen-pipeline-parallel-graphs.py and provide the results directory.

python gen-pipeline-parallel-graphs.py results/

Graphs with Figure12.pdf is stored in experiments directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.