Youhe Jiang's Projects
Training and serving large-scale neural networks with auto parallelization.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
a tool for recursive autoshard
Modeling and simulating experiments on communication and computing costs.
Bagua is a deep learning training acceleration framework for PyTorch. It provides a one-stop training acceleration solution, including faster distributed training compared to PyTorch DDP, faster dataloader, kernel fusion, and more.
TensorFlow code and pre-trained models for BERT
An algorithm to display tree structure
Colossal-AI: A Unified Deep Learning System for Big Model Era
Examples of training models with hybrid parallelism using ColossalAI
CUDA Templates for Linear Algebra Subroutines
A Conditional Independence Test in the Presence of Discretization
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
Deep Reinforcement Learning codes for study. Currently, there are only codes for algorithms: DQN, C51, QR-DQN, IQN, QUOTA.
Example models using DeepSpeed
A repo with model and links to dataset from Egocentric Gesture Recognition for Head-Mounted AR devices (ISMAR 2018 Adjunct)
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
TensorFlow CNN for fast style transfer ā”š„šØš¼
Transformer related optimization, including BERT, GPT
Fast and memory-efficient exact attention
A distributed deep learning framework that supports flexible parallelization strategies.
Running large language models on a single GPU for throughput-oriented scenarios.