Awesome Distributed Deep Learning

A curated list of awesome Distributed Deep Learning resources.

Frameworks

Blogs

Papers

Frameworks

MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
go-mxnet-predictor - Go binding for MXNet c_predict_api to do inference with pre-trained model.
deeplearning4j - Distributed Deep Learning Platform for Java, Clojure, Scala.
Distributed Machine learning Tool Kit (DMTK) - A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
Elephas - Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark.
Horovod - Distributed training framework for TensorFlow.

Blogs

Papers

General:

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis:discusses the different types of concurrency in DNNs; synchronous and asynchronous stochastic gradient descent; distributed system architectures; communication schemes; and performance modeling. Based on these approaches, it also extrapolates the potential directions for parallelism in deep learning.

Synchronization:

Synchronous techniques:

Deep learning with COTS HPC systems: Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology, a cluster of GPU servers with Infiniband interconnects and MPI.
FireCaffe: near-linear acceleration of deep neural network training on compute clusters : The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration this paper makes is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train.
SparkNet: Training Deep Networks in Spark. In Proceedings of the International Conference on Learning Representations (ICLR).
1-Bit SGD: 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs, In Interspeech 2014.
Scalable Distributed DNN Training Using Commodity GPU Cloud Computing:It introduces a new method for scaling up distributed Stochastic Gradient Descent (SGD) training of Deep Neural Networks (DNN). The method solves the well-known communication bottleneck problem that arises for data-parallel SGD because compute nodes frequently need to synchronize a replica of the model.
Multi-GPU Training of ConvNets.: Training of ConvNets on multiple GPU's

Stale-Synchronous techniques:

Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study.
A Fast Learning Algorithm for Deep Belief Nets.:A fast learning algorithm for deep belief nets
Heterogeneity-aware Distributed Parameter Servers.: J. Jiang, B. Cui, C. Zhang, and L. Yu. 2017. Heterogeneity-aware Distributed Parameter Servers. In Proc. 2017 ACM International Conference on Management of Data (SIGMOD ’17). 463–478.
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization:X. Lian, Y. Huang, Y. Li, and J. Liu. 2015. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization. In Proc. 28th Int’l Conf. on NIPS - Volume 2. 2737–2745.
Staleness-Aware Async-SGD for Distributed Deep Learning: W. Zhang, S. Gupta, X. Lian, and J. Liu. 2016. Staleness-aware async-SGD for Distributed Deep Learning. In Proc. Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). 2350–2356.

Asynchronous techniques:

A Unified Analysis of HOGWILD!-style Algorithms.: C. De Sa, C. Zhang, K. Olukotun, and C. Ré. 2015. Taming the Wild: A Unified Analysis of HOGWILD!-style Algorithms. In Proc. 28th Int’l Conf. on NIPS - Volume 2. 2674–2682.
Large Scale Distributed Deep Networks: J. Dean et al. 2012. Large Scale Distributed Deep Networks. In Proc. 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). 1223–1231.
Asynchronous Parallel Stochastic Gradient Descent:J. Keuper and F. Pfreundt. 2015. Asynchronous Parallel Stochastic Gradient Descent: A Numeric Core for Scalable Distributed Machine Learning Algorithms. In Proc. Workshop on MLHPC. 1:1–1:11.
Dogwild!-Distributed Hogwild for CPU & GPU.: C. Noel and S. Osindero. 2014. Dogwild!-Distributed Hogwild for CPU & GPU. In NIPS Workshop on Distributed Machine Learning and Matrix Computations.
GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training.: T. Paine, H. Jin, J. Yang, Z. Lin, and T. S. Huang. 2013. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training. (2013). arXiv:1312.6186
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent: B. Recht, C. Re, S. Wright, and F. Niu. 2011. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 24. 693–701.
Asynchronous stochastic gradient descent for DNN training: S. Zhang, C. Zhang, Z. You, R. Zheng, and B. Xu. 2013. Asynchronous stochastic gradient descent for DNN training. In IEEE International Conference on Acoustics, Speech and Signal Processing. 6660–6663.

Feedback: If you have any ideas or you want any other content to be added to this list, feel free to contribute.

ghanima / awesome-distributed-deep-learning Goto Github PK

awesome-distributed-deep-learning's Introduction

Awesome Distributed Deep Learning

Table of Contents

Frameworks

Blogs

Papers

General:

Synchronization:

Synchronous techniques:

Stale-Synchronous techniques:

Asynchronous techniques:

awesome-distributed-deep-learning's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs