Awesome System for Machine Learning
[Whitepaper You Must Read]
Path to system for AIA curated list of research in machine learning system. Link to the code if available is also present. I also summarize some papers if I think they are really interesting.
Table of Contents
Resources
Papers
System for AI
- Data Processing
- Distributed Training
- Model Serving
- Inference Optimization
- Machine Learning Compiler
- Deep Reinforcement Learning System
- Edge AI
- Video System
AI for System
PR template
- Title [[Paper]](link) [[GitHub]](link)
- Author (*conference(journal) year*)
- Summary:
Book
- Computer Architecture: A Quantitative Approach [Must read]
- Streaming Systems [Book]
- Kubernetes in Action (start to read) [Book]
Video
- SysML 2019: [YouTube]
- ScaledML 2019: David Patterson, Ion Stoica, Dawn Song and so on [YouTube]
- ScaledML 2018: Jeff Dean, Ion Stoica, Yangqing Jia and so on [YouTube] [Slides]
- A New Golden Age for Computer Architecture History, Challenges, and Opportunities. David Patterson [YouTube]
- How to Have a Bad Career. David Patterson (I am a big fan) [YouTube]
- SysML 18: Perspectives and Challenges. Michael Jordan [YouTube]
- SysML 18: Systems and Machine Learning Symbiosis. Jeff Dean [YouTube]
Course
- CS294: AI For Systems and Systems For AI. [UC Berkeley] (Strong Recommendation)
- CSE 599W: System for ML. [Chen Tianqi] [University of Washington]
- CSE 291F: Advanced Data Analytics and ML Systems. [UCSD]
- CSci 8980: Machine Learning in Computer Systems [University of Minnesota, Twin Cities]
Survey
- Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools [Paper]
- RUBEN MAYER, HANS-ARNO JACOBSEN
- Summary:
- How (and How Not) to Write a Good Systems Paper [Advice]
- Applied machine learning at Facebook: a datacenter infrastructure perspective [Paper]
- Hazelwood, Kim, et al. (HPCA 2018)
- Infrastructure for Usable Machine Learning: The Stanford DAWN Project
- Bailis, Peter, Kunle Olukotun, Christopher Ré, and Matei Zaharia. (preprint 2017)
- Hidden technical debt in machine learning systems [Paper]
- Sculley, David, et al. (NIPS 2015)
- Summary:
- End-to-end arguments in system design [Paper]
- Saltzer, Jerome H., David P. Reed, and David D. Clark.
- System Design for Large Scale Machine Learning [Thesis]
- Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications [Paper]
- Park, Jongsoo, Maxim Naumov, Protonu Basu et al. arXiv 2018
- Summary: This paper presents a characterizations of DL models and then shows the new design principle of DL hardware.
Userful Tools
- Netron: Visualizer for deep learning and machine learning models [GitHub]
- Facebook/FBGEMM: FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. [GitHub]
- XiaoMi/mobile-ai-bench: Benchmarking Neural Network Inference on Mobile Devices [GitHub]
- Dslabs: Distributed Systems Labs and Framework for UW system course [GitHub]
- Machine Learning Model Zoo [Website]
- MLPerf Benchmark Suite/Inference: Reference implementations of inference benchmarks [GitHub]
- Pytorch-Memory-Utils: detect your GPU memory during training with Pytorch. [GitHub]
- Faiss: A library for efficient similarity search and clustering of dense vectors [GitHub]
- torchstat: a lightweight neural network analyzer based on PyTorch. [GitHub]
- Microsoft/MMdnn: A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models.[GitHub]
- Popular Network memory consumption and FLOP counts [GitHub]
- Intel® VTune™ Amplifier [Website]
- Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors
- NVIDIA DALI [GitHub]
- A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications
- gpushare-scheduler-extender [GitHub]
- Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization
- TensorRT [NVIDIA]
- It is designed to work in a complementary fashion with training frameworks such as TensorFlow, Caffe, PyTorch, MXNet, etc. It focuses specifically on running an already trained network quickly and efficiently on a GPU for the purpose of generating a result
- TensorStream: A library for real-time video stream decoding to CUDA memory [GitHub]
Project
- Machine Learning for .NET [GitHub]
- ML.NET is a cross-platform open-source machine learning framework which makes machine learning accessible to .NET developers.
- ML.NET allows .NET developers to develop their own models and infuse custom machine learning into their applications, using .NET, even without prior expertise in developing or tuning machine learning models.
- ONNX: Open Neural Network Exchange [GitHub]
- BentoML: Machine Learning Toolkit for packaging and deploying models [GitHub]
- ModelDB: A system to manage ML models [GitHub] [MIT short paper]
- EuclidesDB: A multi-model machine learning feature embedding database [GitHub]
- Prefect: Perfect is a new workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine. [GitHub]
- MindsDB: MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects [GitHub]
- PAI: OpenPAI is an open source platform that provides complete AI model training and resource management capabilities. [Microsoft Project]
- Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems [Facebook Project]
- Osquery is a SQL powered operating system instrumentation, monitoring, and analytics framework. [Facebook Project]
- Horovod: Distributed training framework for TensorFlow, Keras, and PyTorch. [GitHub]
- Seldon: Sheldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.[GitHub]
- Kubeflow: Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable. [GitHub]
Data Prcocessing
- Google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more [GitHub]
- CuPy: NumPy-like API accelerated with CUDA [GitHub]
- Modin: Speed up your Pandas workflows by changing a single line of code [GitHub]
- Weld: Weld is a runtime for improving the performance of data-intensive applications. [Project Website]
- Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines [Project Website]
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe. (PLDI 2013)
- Summary: Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines.
Model Serving
- {PRETZEL}: Opening the Black Box of Machine Learning Prediction Serving Systems. [Paper]
- Lee, Y., Scolari, A., Chun, B.G., Santambrogio, M.D., Weimer, M. and Interlandi, M., 2018. (OSDI 2018)
- Summary:
- Brusta: PyTorch model serving project [GitHub]
- Model Server for Apache MXNet: Model Server for Apache MXNet is a tool for serving neural net models for inference [GitHub]
- TFX: A TensorFlow-Based Production-Scale Machine Learning Platform [Paper] [Website]
- Baylor, Denis, et al. (KDD 2017)
- Summary:
- Tensorflow-serving: Flexible, high-performance ml serving [Paper] [GitHub]
- Olston, Christopher, et al.
- IntelAI/OpenVINO-model-server: Inference model server implementation with gRPC interface, compatible with TensorFlow serving API and OpenVINO™ as the execution backend. [GitHub]
- Clipper: A Low-Latency Online Prediction Serving System [Paper]
[GitHub]
- Crankshaw, Daniel, et al. (NSDI 2017)
- Summary: Adaptive batch
- InferLine: ML Inference Pipeline Composition Framework [Paper]
- Crankshaw, Daniel, et al. (Preprint)
- Summary: update version of Clipper
- TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments [Paper]
- Dakkak, Abdul, et al (Preprint)
- Summary: model cold start problem
- Rafiki: machine learning as an analytics service system [Paper] [GitHub]
- Wang, Wei, Jinyang Gao, Meihui Zhang, Sheng Wang, Gang Chen, Teck Khim Ng, Beng Chin Ooi, Jie Shao, and Moaz Reyad.
- Summary: Contain both training and inference. Auto-Hype-Parameter search for training. Ensemble models for inference. Using DRL to balance trade-off between accuracy and latency.
Machine Learning System Papers (Inference)
- Dynamic Space-Time Scheduling for GPU Inference [Paper]
- Jain, Paras, et al. (NIPS 18, System for ML)
- Summary:
- Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems [Paper]
- Wei, Jinliang, Garth Gibson, Vijay Vasudevan, and Eric Xing. (On going)
- Accelerating Deep Learning Workloads through Efficient Multi-Model Execution. [Paper]
- D. Narayanan, K. Santhanam, A. Phanishayee and M. Zaharia. (NeurIPS Systems for ML Workshop 2018)
- Summary: They assume that their system, HiveMind, is given as input models grouped into model batches that are amenable to co-optimization and co-execution. a compiler, and a runtime.
Machine Learning System Papers (Training)
- Mesh-TensorFlow: Deep Learning for Supercomputers [Paper] [GitHub]
- Shazeer, Noam, Youlong Cheng, Niki Parmar, Dustin Tran, et al. (NIPS 2018)
- Summary: Data parallelism for language model
- PyTorch-BigGraph: A Large-scale Graph Embedding System [Paper] [GitHub]
- Lerer, Adam and Wu, Ledell and Shen, Jiajun and Lacroix, Timothee and Wehrstedt, Luca and Bose, Abhijit and Peysakhovich, Alex (SysML 2019)
- Beyond data and model parallelism for deep neural networks [Paper] [GitHub]
- Jia, Zhihao, Matei Zaharia, and Alex Aiken. (SysML 2019)
- Summary: SOAP (sample, operation, attribution and parameter) parallelism. Operator graph, device topology and extution optimizer. MCMC search algorithm and excution simulator.
- Device placement optimization with reinforcement learning [Paper]
- Mirhoseini, Azalia, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. (ICML 17)
- Summary: Using REINFORCE learn a device placement policy. Group operations to excute. Need a lot of GPUs.
- Spotlight: Optimizing device placement for training deep neural networks [Paper]
- Gao, Yuanxiang, Li Chen, and Baochun Li (ICML 18)
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [Paper][GitHub] [News]
- Huang, Yanping, et al. (arXiv preprint arXiv:1811.06965 (2018))
- Summary:
- Gandiva: Introspective cluster scheduling for deep learning. [Paper]
- Xiao, Wencong, et al. (OSDI 2018)
- Summary: Improvet the efficency of hyper-parameter in cluster. Aware of hardware utilization.
- Optimus: an efficient dynamic resource scheduler for deep learning clusters [Paper]
- Peng, Yanghua, et al. (EuroSys 2018)
- Summary: Job scheduling on clusters. Total complete time as the metric.
Machine Learning Compiler
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
[Project Website]
- {TVM}: An Automated End-to-End Optimizing Compiler for Deep Learning [Paper]
- Chen, Tianqi, et al. (OSDI 2018)
- {TVM}: An Automated End-to-End Optimizing Compiler for Deep Learning [Paper]
- Facebook TC: Tensor Comprehensions (TC) is a fully-functional C++ library to automatically synthesize high-performance machine learning kernels using Halide, ISL and NVRTC or LLVM. [GitHub]
- Tensorflow/mlir: "Multi-Level Intermediate Representation" Compiler Infrastructure [GitHub]
- PyTorch/glow: Compiler for Neural Network hardware accelerators [GitHub]
Deep Reinforcement Learning System
- Ray: A Distributed Framework for Emerging {AI} Applications [GitHub]
- Moritz, Philipp, et al. (OSDI 2018)
- Summary: Distributed DRL training, simulation and inference system. Can be used as a high-performance python framework.
- Elf: An extensive, lightweight and flexible research platform for real-time strategy games [Paper] [GitHub]
- Tian, Yuandong, Qucheng Gong, Wenling Shang, Yuxin Wu, and C. Lawrence Zitnick. (NIPS 2017)
- Summary:
- Horizon: Facebook's Open Source Applied Reinforcement Learning Platform [Paper] [GitHub]
- Gauci, Jason, et al. (preprint 2019)
- RLgraph: Modular Computation Graphs for Deep Reinforcement Learning [Paper][GitHub]
- Schaarschmidt, Michael, Sven Mika, Kai Fricke, and Eiko Yoneki. (SysML 2019)
- Summary:
Video System papers
- CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video [Paper]
- Mao, Huizi, Taeyoung Kong, and William J. Dally. (SysML2019)
- Live Video Analytics at Scale with Approximation and Delay-Tolerance [Paper]
- Zhang, Haoyu, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. (NSDI 2017)
- Chameleon: scalable adaptation of video analytics [Paper]
- Jiang, Junchen, et al. (SIGCOMM 2018)
- Summary: Configuration controller for balancing accuracy and resource. Golden configuration is a good design. Periodic profiling often exceeded any resource savings gained by adapting the configurations.
- Noscope: optimizing neural network queries over video at scale [Paper] [GitHub]
- Kang, Daniel, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. (VLDB2017)
- Summary:
- SVE: Distributed video processing at Facebook scale [Paper]
- Huang, Qi, et al. (SOSP2017)
- Summary:
- Scanner: Efficient Video Analysis at Scale [Paper][GitHub]
- Poms, Alex, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian (SIGGRAPH 2018)
- Summary:
- A cloud-based large-scale distributed video analysis system [Paper]
- Wang, Yongzhe, et al. (ICIP 2016)
- Rosetta: Large scale system for text detection and recognition in images [Paper]
- Borisyuk, Fedor, Albert Gordo, and Viswanath Sivakumar. (KDD 2018)
- Summary:
- Neural adaptive content-aware internet video delivery. [Paper] [GitHub]
- Yeo, H., Jung, Y., Kim, J., Shin, J. and Han, D., 2018. (OSDI 2018)
- Summary: Combine video super-resolution and ABR
Edge or Mobile Papers
- NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision [Paper]
- Fang, Biyi, Xiao Zeng, and Mi Zhang. (MobiCom 2018)
- Summary: Borrow some ideas from network prune. The pruned model then recovers to trade-off computation resource and accuracy at runtime
- Lavea: Latency-aware video analytics on edge computing platform [Paper]
- Yi, Shanhe, et al. (Second ACM/IEEE Symposium on Edge Computing. ACM, 2017.)
- Scaling Video Analytics on Constrained Edge Nodes [Paper] [GitHub]
- Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloo (SysML 2019)
Resource Management
- Resource management with deep reinforcement learning [Paper] [GitHub]
- Mao, Hongzi, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula (ACM HotNets 2016)
- Summary: Highly cited paper. Nice definaton. An example solution that translates the problem of packing tasks with multiple resource demands into a learning problem and then used DRL to solve it.
Advanced Theory
- Differentiable MPC for End-to-end Planning and Control [Paper] [GitHub]
- Amos, Brandon, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter (NIPS 2018)
Traditional System Optimization Papers
- AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers
[Paper]
- Gandhi, Anshul, et al. (TOCS 2012)