GithubHelp home page GithubHelp logo

liuyajian / caffe-mpi.github.io Goto Github PK

View Code? Open in Web Editor NEW

This project forked from caffe-mpi/caffe-mpi.github.io

0.0 2.0 0.0 11.25 MB

License: Other

CMake 2.73% Makefile 0.69% Shell 0.34% C++ 79.82% Cuda 5.69% MATLAB 0.89% M 0.01% Python 8.24% Protocol Buffer 1.60%

caffe-mpi.github.io's Introduction

Caffe-MPI for Deep Learning

Introduction

Caffe-MPI is a deep learning framework designed for both efficiency and flexibility, developed by HPC development team of inspur. It is a GPU cluster version, which is designed and developed on the BVLC single GPU version ( https://github.com/BVLC/caffe, more details please visit http://caffe.berkeleyvision.org).

Features

(1) The design based on HPC system

The design of Caffe-MPI is based on HPC system architecture; System hardware: Lustre+IB+GPU; it adopts multi process and multi thread to read the training data in parallel which can be achieve higher IO throughput in this way; the parameters fast transmission and model updating through IB network; The software programming model uses MPI+PThread+CUDA, MPI communication between each node, PThread and CUDA threads parallelism in the node;

(2) High performance and high scalability

The model can be trained on multi-node-multi-GPU-card platform through Caffe-MPI, we got a better performance improvement compared with BVLC Caffe-master version, Caffe-MPI can be implemented for large-scale data training, the performance of goolgenet we trained through Caffe-MPI is 13 times than the performance trained through BVLC Caffe-master. It supports above 16+ GPUs extension, and the parallel efficiency can reach more than 80%.

(3) Good inheritance and easy-using

Caffe-MPI retains all the features of the original Caffe architecture, namely the pure C++/CUDA architecture, support of the command line, Python interfaces, and various programming methods. As a result, the cluster version of the Caffe framework is user-friendly, fast, modularized and open, and gives users the optimal application experience.

How to use it

See Caffe-MPI user guide.pdf

Try your first MPI Caffe

This program can run 2 processes at least.

cifar10

  1. Run data/cifar10/get_cifar10.sh to get cifar10 data.
  2. Run examples/cifar10/create_cifar10.sh to conver raw data to leveldb format.
  3. Run examples/cifar10/mpi_train_quick.sh to train the net. You can modify the "-n 16" to set new process number where 16 is the number of parallel processes, (if you use GPUs, the process number is m+node_num, m is GPU number) the "-host node11" is the node name in mpi_train_quick.sh script.
  4. Example of mpi_train_quick.sh script. mpirun -machinefile hostsib -n 20 ./build/tools/caffe train \ --solver=examples/cifar10/cifar10_quick_solver.prototxt

Reference

  • More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

  • Deep Image: Scaling up Image Recognition

Ask Questions

  • For reporting bugs, please use the caffe-mpi/issues page or send email to us.

  • Email address: [email protected]

Author

Zhang,Qing; Wang,Yajuan;Gong;Zhan; Shen,Bo ;

Acknowledgements

The Caffe-MPI developers would like to thank QiHoo(Zhang,Gang ; Dr.Hu,Jinhui) Nvidia(Dr.Simon See ; Jessy Huan; joey Wang) for algorithm support and Inspur for guidance during Caffe-MPI development.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.