A curated list of research in machine learning system. Link to the code if available is also present. I also summarize some papers if I think they are really interesting.
I categorize them by myself. You are kindly invited to pull requests!
- Model Deployment
- Inference Optimization
- Distributed Training
- Resource Management
- Deep Reinforcement Learning System
- Edge AI
- Video System
- Advanced Theory
- Traditional System Optimization
- Computer Architecture: A Quantitative Approach [Must read]
- Streaming Systems [Book]
- Kubernetes in Action (start to read) [Book]
- A New Golden Age for Computer Architecture History, Challenges, and Opportunities. David Patterson [YouTube]
- How to Have a Bad Career. David Patterson (I am a big fan) [YouTube]
- SysML 18: Perspectives and Challenges. Michael Jordan [YouTube]
- SysML 18: Systems and Machine Learning Symbiosis. Jeff Dean [YouTube]
- CS294: AI For Systems and Systems For AI. [UC Berkeley] (Strong Recommendation)
- CSE 599W: System for ML. [Chen Tianqi] [University of Washington]
- CSE 291F: Advanced Data Analytics and ML Systems. [UCSD]
- CSci 8980: Machine Learning in Computer Systems [University of Minnesota, Twin Cities]
- Hidden technical debt in machine learning systems [Paper]
- Sculley, David, et al. (NIPS 2015)
- Summary:
- End-to-end arguments in system design [Paper]
- Saltzer, Jerome H., David P. Reed, and David D. Clark.
- System Design for Large Scale Machine Learning [Thesis]
- Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications [Paper]
- Park, Jongsoo, Maxim Naumov, Protonu Basu et al. arXiv 2018
- Summary: This paper presents a characterizations of DL models and then shows the new design principle of DL hardware.
- Intel® VTune™ Amplifier [Website]
- Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors
- NVIDIA DALI [GitHub]
- A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications
- gpushare-scheduler-extender [GitHub]
- Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization
- TensorRT [NVIDIA]
- It is designed to work in a complementary fashion with training frameworks such as TensorFlow, Caffe, PyTorch, MXNet, etc. It focuses specifically on running an already trained network quickly and efficiently on a GPU for the purpose of generating a result
- Weld: Weld is a runtime for improving the performance of data-intensive applications. [Project Website]
- MindsDB: MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects [GitHub]
- PAI: OpenPAI is an open source platform that provides complete AI model training and resource management capabilities. [Microsoft Project]
- Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems [Facebook Project]
- Osquery is a SQL powered operating system instrumentation, monitoring, and analytics framework. [Facebook Project]
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning [Project Website]
- Horovod: Distributed training framework for TensorFlow, Keras, and PyTorch. [GitHub]
- Seldon: Sheldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.[GitHub]
- Kubeflow: Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable. [GitHub]
- Clipper: A Low-Latency Online Prediction Serving System [Paper]
[GitHub]
- Crankshaw, Daniel, et al. (NSDI 2017)
- Summary: Adaptive batch
- InferLine: ML Inference Pipeline Composition Framework [Paper]
- Crankshaw, Daniel, et al. (Preprint)
- Summary: update version of Clipper
- TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments [Paper]
- Dakkak, Abdul, et al (Preprint)
- Summary: model cold start problem
- Dynamic Space-Time Scheduling for GPU Inference [Paper]
- Jain, Paras, et al. (NIPS 18, System for ML)
- Summary:
- Dynamic Scheduling For Dynamic Control Flow in Deep Learning Systems [Paper]
- Wei, Jinliang, Garth Gibson, Vijay Vasudevan, and Eric Xing. (On going)
- Accelerating Deep Learning Workloads through Efficient Multi-Model Execution. [Paper]
- D. Narayanan, K. Santhanam, A. Phanishayee and M. Zaharia. (NeurIPS Systems for ML Workshop 2018)
- Summary: They assume that their system, HiveMind, is given as input models grouped into model batches that are amenable to co-optimization and co-execution. a compiler, and a runtime.
- Beyond data and model parallelism for deep neural networks [Paper]
- Jia, Zhihao, Matei Zaharia, and Alex Aiken. (SysML 2019)
- Summary: SOAP (sample, operation, attribution and parameter) parallelism. Operator graph, device topology and extution optimizer. MCMC search algorithm and excution simulator.
- Device placement optimization with reinforcement learning [Paper]
- Mirhoseini, Azalia, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. (ICML 17)
- Summary: Using REINFORCE learn a device placement policy. Group operations to excute. Need a lot of GPUs.
- Spotlight: Optimizing device placement for training deep neural networks [Paper]
- Gao, Yuanxiang, Li Chen, and Baochun Li (ICML 18)
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [Paper][GitHub] [News]
- Huang, Yanping, et al. (arXiv preprint arXiv:1811.06965 (2018))
- Summary:
- Resource management with deep reinforcement learning [Paper] [GitHub]
- Mao, Hongzi, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula (ACM HotNets 2016)
- Summary: Highly cited paper. Nice definaton. An example solution that translates the problem of packing tasks with multiple resource demands into a learning problem and then used DRL to solve it.
- Ray: A Distributed Framework for Emerging {AI} Applications [GitHub]
- Moritz, Philipp, et al. (OSDI 2018)
- Summary: Distributed DRL training, simulation and inference system. Can be used as a high-performance python framework.
- Live Video Analytics at Scale with Approximation and Delay-Tolerance [Paper]
- Zhang, Haoyu, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. (NSDI 2017)
- Chameleon: scalable adaptation of video analytics [Paper]
- Jiang, Junchen, et al. (SIGCOMM 2018)
- Summary: Configuration controller for balancing accuracy and resource. Golden configuration is a good design. Periodic profiling often exceeded any resource savings gained by adapting the configurations.
- Noscope: optimizing neural network queries over video at scale [Paper] [GitHub]
- Kang, Daniel, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. (VLDB2017)
- Summary:
- SVE: Distributed video processing at Facebook scale [Paper]
- Huang, Qi, et al. (SOSP2017)
- Summary:
- Scanner: Efficient Video Analysis at Scale [Paper][GitHub]
- Poms, Alex, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian (SIGGRAPH 2018)
- Summary:
- NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision [Paper]
- Fang, Biyi, Xiao Zeng, and Mi Zhang. (MobiCom 2018)
- Summary: Borrow some ideas from network prune. The pruned model then recovers to trade-off computation resource and accuracy at runtime
- Lavea: Latency-aware video analytics on edge computing platform [Paper]
- Yi, Shanhe, et al. (Second ACM/IEEE Symposium on Edge Computing. ACM, 2017.)
- Differentiable MPC for End-to-end Planning and Control [Paper] [GitHub]
- Amos, Brandon, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter (NIPS 2018)
- AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers
[Paper]
- Gandhi, Anshul, et al. (TOCS 2012)