GithubHelp home page GithubHelp logo

00mjk / p4app-switchml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from p4lang/p4app-switchml

0.0 0.0 0.0 614 KB

Switch ML Application

Home Page: https://switchml.readthedocs.io/

License: Apache License 2.0

Makefile 4.31% C++ 50.10% NASL 3.87% C 0.16% P4 15.36% Shell 0.25% Python 23.66% Dockerfile 2.29%

p4app-switchml's Introduction

SwitchML: Switch-Based Training Acceleration for Machine Learning

SwitchML accelerates the Allreduce communication primitive commonly used by distributed Machine Learning frameworks. It uses a programmable switch dataplane to perform in-network computation, reducing the volume of exchanged data by aggregating vectors (e.g., model updates) from multiple workers in the network. It provides an end-host library that can be integrated with ML frameworks to provide an efficient solution that speeds up training for a number of real-world benchmark models.

The switch hardware is programmed with a P4 program for the Tofino Native Architecture (TNA) and managed at runtime through a Python controller using BFRuntime. The end-host library provides simple APIs to perform Allreduce operations using different transport protocols. We currently support UDP through DPDK and RDMA UC. The library has already been integrated with ML frameworks as a NCCL plugin.

Getting started

To run SwitchML you need to:

The examples folder provides simple programs that show how to use the APIs.

Repo organization

The SwitchML repository is organized as follows:

docs: project documentation
dev_root:
  ┣ p4: P4 code for TNA
  ┣ controller: switch controller program
  ┣ client_lib: end-host library
  ┣ examples: set of example programs
  ┣ benchmarks: programs used to test raw performance
  ┣ frameworks_integration: code to integrate with ML frameworks
  ┣ third_party: third party software
  ┣ protos: protobuf description for the interface between controller and end-host
  ┗ scripts: helper scripts

Testing

The benchmarks contain a benchmarks program that we used to measure SwitchML performances. In our experiments (see benchmark documentation for details) we observed a more than 2x speedup over NCCL when using RDMA. Moreover, differently from ring Allreduce, with SwitchML performance are constant with any number of workers.

Benchmarks

Publication

Scaling Distributed Machine Learning with In-Network Aggregation A. Sapio, M. Canini, C.-Y. Ho, J. Nelson, P. Kalnis, C. Kim, A. Krishnamurthy, M. Moshref, D. R. K. Ports, P. Richtarik. In Proceedings of NSDI’21, Apr 2021.

Contributing

This project welcomes contributions and suggestions. To learn more about making a contribution to SwitchML, please see our Contribution page.

The Team

SwitchML is a project driven by the P4.org community and is currently maintained by Amedeo Sapio, Omar Alama, Marco Canini, Jacob Nelson.

License

SwitchML is released with an Apache License 2.0, as found in the LICENSE file.

p4app-switchml's People

Contributors

amedeosapio avatar jnfoster avatar oasisartisan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.