GithubHelp home page GithubHelp logo

algorithmic-efficiency's Introduction

MLCommons™ Algorithmic Efficiency


MLCommons Logo

InstallationRulesContributingLicense

CI Lint License: Apache 2.0 Code style: yapf


MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. This repository holds the competition rules and the benchmark code to run it.

Installation

  1. Create new environment, e.g. via conda or virtualenv:

    Python minimum requirement >= 3.7

     sudo apt-get install python3-venv
     python3 -m venv env
     source env/bin/activate
  2. Clone this repository:

    git clone https://github.com/mlcommons/algorithmic-efficiency.git
    cd algorithmic-efficiency
  3. We use pip to install the algorithmic_efficiency.

TL;DR to install the Jax version for GPU run:

pip3 install -e '.[pytorch_cpu]'
pip3 install -e '.[jax_gpu]' -f 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html'
pip3 install -e '.[full]'

TL;DR to install the PyTorch version for GPU run:

pip3 install -e '.[jax_cpu]'
pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html'
pip3 install -e '.[full]'

Additional Details

You can also install the requirements for individual workloads, e.g. via

pip3 install -e '.[librispeech]'

or all workloads at once via

pip3 install -e '.[full]'

Depending on the framework you want to use (e.g. JAX or PyTorch) you need to install them as well. You could either do this manually or by adding the corresponding options:

JAX (GPU)

pip3 install -e '.[jax_gpu]' -f 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html'

JAX (CPU)

pip3 install -e '.[jax_cpu]'

PyTorch (GPU)

pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html'

PyTorch (CPU)

pip3 install -e '.[pytorch_cpu]'

Development

To use the development tools such as pytest or pylint use the dev option:

pip3 install -e '.[dev]'
pre-commit install

To get an installation with the requirements for all workloads and development, use the argument [full_dev].

Setup

  1. Clone this repository:

    git clone https://github.com/mlcommons/algorithmic-efficiency.git
  2. Build Docker

    cd algorithmic-efficiency/ && sudo docker build -t algorithmic-efficiency .
  3. Run Docker

    sudo docker run --gpus all -it --rm -v $PWD:/home/ubuntu/algorithmic-efficiency --ipc=host algorithmic-efficiency

    Currently docker method installs both PyTorch and JAX

Running a workload

See the reference_algorithms/ dir for training various algorithm implementations (note that none of these are valid submissions because they have workload-specific logic, so we refer to them as "algorithms" instead of "submissions").

JAX

python3 submission_runner.py \
    --framework=jax \
    --workload=mnist \
    --experiment_dir=/home/znado \
    --experiment_name=baseline \
    --submission_path=reference_algorithms/development_algorithms/mnist/mnist_jax/submission.py \
    --tuning_search_space=reference_algorithms/development_algorithms/mnist/tuning_search_space.json

PyTorch

python3 submission_runner.py \
    --framework=pytorch \
    --workload=mnist \
    --experiment_dir=/home/znado \
    --experiment_name=baseline \
    --submission_path=reference_algorithms/development_algorithms/mnist/mnist_pytorch/submission.py \
    --tuning_search_space=reference_algorithms/development_algorithms/mnist/tuning_search_space.json

When using multiple GPUs on a single node it is recommended to use PyTorch's distributed data parallel. To do so, simply replace python3 by

torchrun --standalone --nnodes=1 --nproc_per_node=N_GPUS

where N_GPUS is the number of available GPUs on the node. To only see output from the first process, you can run the following to redirect the output from processes 1-7 to a log file:

torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8

Rules

The rules for the MLCommons Algorithmic Efficency benchmark can be found in the seperate rules document. Suggestions, clarifications and questions can be raised via pull requests.

Contributing

If you are interested in contributing to the work of the working group, feel free to join the weekly meetings, open issues, and see the MLCommons contributing guidelines.

Presubmit testing

We run basic presubmit checks with GitHub Actions, configured in the .github/workflows folder.

To run the below commands, use the versions installed via pip install -e '.[dev]'.

To automatically fix formatting errors, run the following (WARNING: this will edit your code, so it is suggested to make a git commit first!):

yapf -i -r -vv -p algorithmic_efficiency baselines datasets reference_algorithms tests *.py

To sort all import orderings, run the following:

isort .

To just print out all offending import orderings, run the following:

isort . --check --diff

To print out all offending pylint issues, run the following:

pylint algorithmic_efficiency
pylint baselines
pylint datasets
pylint reference_algorithms
pylint submission_runner.py
pylint tests

You can also use python tests/reference_algorithm_tests.py to run a single model update and two model evals for each workload using the reference algorithm in reference_algorithms/development_algorithms/.

algorithmic-efficiency's People

Contributors

alpv95 avatar bezenek avatar chandramouli-sastry avatar ckt624 avatar clashluke avatar danielsnider avatar danielsuo avatar fsschneider avatar georgedahl avatar guschmue avatar hanlint avatar kamalkraj avatar morphine00 avatar petermattson avatar pomonam avatar priyakasimbeg avatar rakshithvasudev avatar runame avatar sourabh2k15 avatar znado avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.