colossalai-deepnet's Introduction

DeepNet: An Implementation based on Colossal-AI

Overview

This is the re-implementation of model DeepNet from paper DeepNet: Scaling Transformers to 1,000 Layers.

DeepNet can scale transformer models to 1000 layers by applying DeepNorm. This Colossal-AI based implementation support data parallelism, pipeline parallelism and 1D tensor parallelism for training.

How to prepare datasets

Decoder-only DeepNet

The decoder-only DeepNet model is modified from the GPT model. In this example, we use WebText dataset for training. The way we prepare dataset is same as which in Colossal-AI based GPT example.

How to run

Decoder-only DeepNet

#!/usr/bin/env sh
export DATA=/path/to/train_data.json

torchrun --standalone --nproc_per_node=<num_gpus> train_deepnet_decoder.py --config=decoder_configs/deepnet_pp1d.py --from_torch

Please modify DATA, num_gpus with the path to your dataset and the number of GPUs respectively. You can also modify the config file decoder_configs/deepnet_pp1d.py to further change parallel settings, training hyperparameters and model details.

features

Decoder-only DeepNet
Encoder-Decoder DeepNet

Recommend Projects

yuxuan-lou / colossalai-deepnet Goto Github PK

colossalai-deepnet's Introduction

DeepNet: An Implementation based on Colossal-AI

Overview

How to prepare datasets

Decoder-only DeepNet

How to run

Decoder-only DeepNet

features

colossalai-deepnet's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs