Awesome System for Machine Learning

Path to system for AI [Whitepaper You Must Read]

A curated list of research in machine learning system. Link to the code if available is also present. I also summarize some papers if I think they are really interesting.

Resources

Book
Video
Course
Survey
Tool
Project with code

Papers

AI for System

Resource Management
Advanced Theory
Traditional System Optimization

PR template

- Title [[Paper]](link) [[GitHub]](link)
  - Author (*conference(journal) year*)
  - Summary:

Book

Computer Architecture: A Quantitative Approach [Must read]
Streaming Systems [Book]
Kubernetes in Action (start to read) [Book]

Video

SysML 2019: [YouTube]
ScaledML 2019: David Patterson, Ion Stoica, Dawn Song and so on [YouTube]
ScaledML 2018: Jeff Dean, Ion Stoica, Yangqing Jia and so on [YouTube] [Slides]
A New Golden Age for Computer Architecture History, Challenges, and Opportunities. David Patterson [YouTube]
How to Have a Bad Career. David Patterson (I am a big fan) [YouTube]
SysML 18: Perspectives and Challenges. Michael Jordan [YouTube]
SysML 18: Systems and Machine Learning Symbiosis. Jeff Dean [YouTube]

Course

CS294: AI For Systems and Systems For AI. [UC Berkeley] (Strong Recommendation)
CSE 599W: System for ML. [Chen Tianqi] [University of Washington]
CSE 291F: Advanced Data Analytics and ML Systems. [UCSD]
CSci 8980: Machine Learning in Computer Systems [University of Minnesota, Twin Cities]

Survey

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools [Paper]
- RUBEN MAYER, HANS-ARNO JACOBSEN
- Summary:
How (and How Not) to Write a Good Systems Paper [Advice]
Applied machine learning at Facebook: a datacenter infrastructure perspective [Paper]
- Hazelwood, Kim, et al. (HPCA 2018)
Infrastructure for Usable Machine Learning: The Stanford DAWN Project
- Bailis, Peter, Kunle Olukotun, Christopher Ré, and Matei Zaharia. (preprint 2017)
Hidden technical debt in machine learning systems [Paper]
- Sculley, David, et al. (NIPS 2015)
- Summary:
End-to-end arguments in system design [Paper]
- Saltzer, Jerome H., David P. Reed, and David D. Clark.
System Design for Large Scale Machine Learning [Thesis]
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications [Paper]
- Park, Jongsoo, Maxim Naumov, Protonu Basu et al. arXiv 2018
- Summary: This paper presents a characterizations of DL models and then shows the new design principle of DL hardware.

Userful Tools

Netron: Visualizer for deep learning and machine learning models [GitHub]
Facebook/FBGEMM: FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. [GitHub]
XiaoMi/mobile-ai-bench: Benchmarking Neural Network Inference on Mobile Devices [GitHub]
Dslabs: Distributed Systems Labs and Framework for UW system course [GitHub]
Machine Learning Model Zoo [Website]
MLPerf Benchmark Suite/Inference: Reference implementations of inference benchmarks [GitHub]
Pytorch-Memory-Utils: detect your GPU memory during training with Pytorch. [GitHub]
Faiss: A library for efficient similarity search and clustering of dense vectors [GitHub]
torchstat: a lightweight neural network analyzer based on PyTorch. [GitHub]
Microsoft/MMdnn: A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models.[GitHub]
Popular Network memory consumption and FLOP counts [GitHub]
Intel® VTune™ Amplifier [Website]
- Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors
NVIDIA DALI [GitHub]
- A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications
gpushare-scheduler-extender [GitHub]
- Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization
TensorRT [NVIDIA]
- It is designed to work in a complementary fashion with training frameworks such as TensorFlow, Caffe, PyTorch, MXNet, etc. It focuses specifically on running an already trained network quickly and efficiently on a GPU for the purpose of generating a result
TensorStream: A library for real-time video stream decoding to CUDA memory [GitHub]

Project

Machine Learning for .NET [GitHub]
- ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers.
- ML.NET allows .NET developers to develop their own models and infuse custom machine learning into their applications, using .NET, even without prior expertise in developing or tuning machine learning models.
ONNX: Open Neural Network Exchange [GitHub]
BentoML: Machine Learning Toolkit for packaging and deploying models [GitHub]
ModelDB: A system to manage ML models [GitHub] [MIT short paper]
EuclidesDB: A multi-model machine learning feature embedding database [GitHub]
Prefect: Perfect is a new workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine. [GitHub]
MindsDB: MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects [GitHub]
PAI: OpenPAI is an open source platform that provides complete AI model training and resource management capabilities. [Microsoft Project]
Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems [Facebook Project]
Osquery is a SQL powered operating system instrumentation, monitoring, and analytics framework. [Facebook Project]
Horovod: Distributed training framework for TensorFlow, Keras, and PyTorch. [GitHub]
Seldon: Sheldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.[GitHub]
Kubeflow: Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable. [GitHub]

Data Prcocessing

Google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more [GitHub]
CuPy: NumPy-like API accelerated with CUDA [GitHub]
Modin: Speed up your Pandas workflows by changing a single line of code [GitHub]
Weld: Weld is a runtime for improving the performance of data-intensive applications. [Project Website]
Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines [Project Website]
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe. (PLDI 2013)
- Summary: Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines.

Model Serving

{PRETZEL}: Opening the Black Box of Machine Learning Prediction Serving Systems. [Paper]
- Lee, Y., Scolari, A., Chun, B.G., Santambrogio, M.D., Weimer, M. and Interlandi, M., 2018. (OSDI 2018)
- Summary:
Brusta: PyTorch model serving project [GitHub]
Model Server for Apache MXNet: Model Server for Apache MXNet is a tool for serving neural net models for inference [GitHub]
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform [Paper] [Website]
- Baylor, Denis, et al. (KDD 2017)
- Summary:
Tensorflow-serving: Flexible, high-performance ml serving [Paper] [GitHub]
- Olston, Christopher, et al.
IntelAI/OpenVINO-model-server: Inference model server implementation with gRPC interface, compatible with TensorFlow serving API and OpenVINO™ as the execution backend. [GitHub]
Clipper: A Low-Latency Online Prediction Serving System [Paper] [GitHub]
- Crankshaw, Daniel, et al. (NSDI 2017)
- Summary: Adaptive batch
InferLine: ML Inference Pipeline Composition Framework [Paper]
- Crankshaw, Daniel, et al. (Preprint)
- Summary: update version of Clipper
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments [Paper]
- Dakkak, Abdul, et al (Preprint)
- Summary: model cold start problem
Rafiki: machine learning as an analytics service system [Paper] [GitHub]
- Wang, Wei, Jinyang Gao, Meihui Zhang, Sheng Wang, Gang Chen, Teck Khim Ng, Beng Chin Ooi, Jie Shao, and Moaz Reyad.
- Summary: Contain both training and inference. Auto-Hype-Parameter search for training. Ensemble models for inference. Using DRL to balance trade-off between accuracy and latency.

Machine Learning System Papers (Inference)

Dynamic Space-Time Scheduling for GPU Inference [Paper]
- Jain, Paras, et al. (NIPS 18, System for ML)
- Summary:
Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems [Paper]
- Wei, Jinliang, Garth Gibson, Vijay Vasudevan, and Eric Xing. (On going)
Accelerating Deep Learning Workloads through Efficient Multi-Model Execution. [Paper]
- D. Narayanan, K. Santhanam, A. Phanishayee and M. Zaharia. (NeurIPS Systems for ML Workshop 2018)
- Summary: They assume that their system, HiveMind, is given as input models grouped into model batches that are amenable to co-optimization and co-execution. a compiler, and a runtime.

Machine Learning System Papers (Training)

Mesh-TensorFlow: Deep Learning for Supercomputers [Paper] [GitHub]
- Shazeer, Noam, Youlong Cheng, Niki Parmar, Dustin Tran, et al. (NIPS 2018)
- Summary: Data parallelism for language model
PyTorch-BigGraph: A Large-scale Graph Embedding System [Paper] [GitHub]
- Lerer, Adam and Wu, Ledell and Shen, Jiajun and Lacroix, Timothee and Wehrstedt, Luca and Bose, Abhijit and Peysakhovich, Alex (SysML 2019)
Beyond data and model parallelism for deep neural networks [Paper] [GitHub]
- Jia, Zhihao, Matei Zaharia, and Alex Aiken. (SysML 2019)
- Summary: SOAP (sample, operation, attribution and parameter) parallelism. Operator graph, device topology and extution optimizer. MCMC search algorithm and excution simulator.
Device placement optimization with reinforcement learning [Paper]
- Mirhoseini, Azalia, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. (ICML 17)
- Summary: Using REINFORCE learn a device placement policy. Group operations to excute. Need a lot of GPUs.
Spotlight: Optimizing device placement for training deep neural networks [Paper]
- Gao, Yuanxiang, Li Chen, and Baochun Li (ICML 18)
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [Paper][GitHub] [News]
- Huang, Yanping, et al. (arXiv preprint arXiv:1811.06965 (2018))
- Summary:
Gandiva: Introspective cluster scheduling for deep learning. [Paper]
- Xiao, Wencong, et al. (OSDI 2018)
- Summary: Improvet the efficency of hyper-parameter in cluster. Aware of hardware utilization.
Optimus: an efficient dynamic resource scheduler for deep learning clusters [Paper]
- Peng, Yanghua, et al. (EuroSys 2018)
- Summary: Job scheduling on clusters. Total complete time as the metric.

Machine Learning Compiler

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning [Project Website]
- {TVM}: An Automated End-to-End Optimizing Compiler for Deep Learning [Paper]
  - Chen, Tianqi, et al. (OSDI 2018)
Facebook TC: Tensor Comprehensions (TC) is a fully-functional C++ library to automatically synthesize high-performance machine learning kernels using Halide, ISL and NVRTC or LLVM. [GitHub]
Tensorflow/mlir: "Multi-Level Intermediate Representation" Compiler Infrastructure [GitHub]
PyTorch/glow： Compiler for Neural Network hardware accelerators [GitHub]

Deep Reinforcement Learning System

Ray: A Distributed Framework for Emerging {AI} Applications [GitHub]
- Moritz, Philipp, et al. (OSDI 2018)
- Summary: Distributed DRL training, simulation and inference system. Can be used as a high-performance python framework.
Elf: An extensive, lightweight and flexible research platform for real-time strategy games [Paper] [GitHub]
- Tian, Yuandong, Qucheng Gong, Wenling Shang, Yuxin Wu, and C. Lawrence Zitnick. (NIPS 2017)
- Summary:
Horizon: Facebook's Open Source Applied Reinforcement Learning Platform [Paper] [GitHub]
- Gauci, Jason, et al. (preprint 2019)
RLgraph: Modular Computation Graphs for Deep Reinforcement Learning [Paper][GitHub]
- Schaarschmidt, Michael, Sven Mika, Kai Fricke, and Eiko Yoneki. (SysML 2019)
- Summary:

Video System papers

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video [Paper]
- Mao, Huizi, Taeyoung Kong, and William J. Dally. (SysML2019)
Live Video Analytics at Scale with Approximation and Delay-Tolerance [Paper]
- Zhang, Haoyu, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. (NSDI 2017)
Chameleon: scalable adaptation of video analytics [Paper]
- Jiang, Junchen, et al. (SIGCOMM 2018)
- Summary: Configuration controller for balancing accuracy and resource. Golden configuration is a good design. Periodic profiling often exceeded any resource savings gained by adapting the configurations.
Noscope: optimizing neural network queries over video at scale [Paper] [GitHub]
- Kang, Daniel, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. (VLDB2017)
- Summary:
SVE: Distributed video processing at Facebook scale [Paper]
- Huang, Qi, et al. (SOSP2017)
- Summary:
Scanner: Efficient Video Analysis at Scale [Paper][GitHub]
- Poms, Alex, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian (SIGGRAPH 2018)
- Summary:
A cloud-based large-scale distributed video analysis system [Paper]
- Wang, Yongzhe, et al. (ICIP 2016)
Rosetta: Large scale system for text detection and recognition in images [Paper]
- Borisyuk, Fedor, Albert Gordo, and Viswanath Sivakumar. (KDD 2018)
- Summary:
Neural adaptive content-aware internet video delivery. [Paper] [GitHub]
- Yeo, H., Jung, Y., Kim, J., Shin, J. and Han, D., 2018. (OSDI 2018)
- Summary: Combine video super-resolution and ABR

Edge or Mobile Papers

NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision [Paper]
- Fang, Biyi, Xiao Zeng, and Mi Zhang. (MobiCom 2018)
- Summary: Borrow some ideas from network prune. The pruned model then recovers to trade-off computation resource and accuracy at runtime
Lavea: Latency-aware video analytics on edge computing platform [Paper]
- Yi, Shanhe, et al. (Second ACM/IEEE Symposium on Edge Computing. ACM, 2017.)
Scaling Video Analytics on Constrained Edge Nodes [Paper] [GitHub]
- Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloo (SysML 2019)

Resource Management

Resource management with deep reinforcement learning [Paper] [GitHub]
- Mao, Hongzi, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula (ACM HotNets 2016)
- Summary: Highly cited paper. Nice definaton. An example solution that translates the problem of packing tasks with multiple resource demands into a learning problem and then used DRL to solve it.

Advanced Theory

Differentiable MPC for End-to-end Planning and Control [Paper] [GitHub]
- Amos, Brandon, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter (NIPS 2018)

Traditional System Optimization Papers

AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers [Paper]
- Gandhi, Anshul, et al. (TOCS 2012)

batermj / awesome-system-for-machine-learning Goto Github PK

awesome-system-for-machine-learning's Introduction

Awesome System for Machine Learning

Path to system for AI [Whitepaper You Must Read]

Table of Contents

Resources

Papers

System for AI

AI for System

PR template

Book

Video

Course

Survey

Userful Tools

Project

Data Prcocessing

Model Serving

Machine Learning System Papers (Inference)

Machine Learning System Papers (Training)

Machine Learning Compiler

Deep Reinforcement Learning System

Video System papers

Edge or Mobile Papers

Resource Management

Advanced Theory

Traditional System Optimization Papers

awesome-system-for-machine-learning's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs