Dropout Reduces Underfitting

Official PyTorch implementation for Dropout Reduces Underfitting

Dropout Reduces Underfitting
Zhuang Liu*, Zhiqiu Xu*, Joseph Jin, Zhiqiang Shen, Trevor Darrell (* equal contribution)
Meta AI, UC Berkeley and MBZUAI

Figure: We propose early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.

Results on ImageNet-1K

Early Dropout

results with basic recipe (s.d. = stochastic depth)

model	ViT-T	Mixer-S	Swin-F	ConvNeXt-F
no dropout	73.9	71.0	74.3	76.1
standard dropout	67.9	67.1	71.6	-
standard s.d.	72.6	70.5	73.7	75.5
early dropout	74.3	71.3	74.7	-
early s.d.	74.4	71.7	75.2	76.3

results with improved recipe

model	ViT-T	Swin-F	ConvNeXt-F
no dropout	76.3	76.1	77.5
standard dropout	71.5	73.5	-
standard s.d.	75.6	75.6	77.4
early dropout	76.7	76.6	-
early s.d.	76.7	76.6	77.7

Late Dropout

results with basic recipe

model	ViT-B	Mixer-B
standard s.d.	81.6	78.0
late s.d.	82.3	78.6

Installation

Please check INSTALL.md for installation instructions.

Training

Basic Recipe

We list commands for early dropout, early stochastic depth on ViT-T and late stochastic depth on ViT-B.

For training other models, change --model accordingly, e.g., to vit_tiny, mixer_s32, convnext_femto, mixer_b16, vit_base.
Our results were produced with 4 nodes, each with 8 gpus. Below we give example commands on both multi-node and single-machine setups.

Early dropout

multi-node

python run_with_submitit.py --nodes 4 --ngpus 8 \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 1 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

single-machine

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Early stochastic depth

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Late stochastic depth

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_base --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.4 --drop_mode late --drop_schedule constant --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Standard dropout / no dropout (replace $p with 0.1 / 0.0)

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout $p --drop_mode standard \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Improved Recipe

Our improved recipe extends training epochs from 300 to 600, and reduces both mixup and cutmix to 0.3.

Early dropout

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Early stochastic depth

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/

Evaluation

single-GPU

python main.py --model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data

multi-GPU

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data

We will release ImageNet-1K model weights soon.

Acknowledgement

This repository is built using the timm library and ConvNeXt codebase.

License

This project is released under the CC-BY-NC 4.0 license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@article{liu2023dropout,
  title={Dropout Reduces Underfitting},
  author={Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell},
  year={2023},
  journal={arXiv preprint arXiv:2303.01500},
}

lyttonkeepfoing / dropout Goto Github PK

dropout's Introduction

Dropout Reduces Underfitting

Results on ImageNet-1K

Early Dropout

Late Dropout

Installation

Training

Basic Recipe

Improved Recipe

Evaluation

Acknowledgement

License

Citation

dropout's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs