This is the official Pytorch/PytorchLightning implementation of the paper:
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
We propose a simple yet fast and effective partial convolution (PConv), as well as a latency-efficient family of architectures called FasterNet.
Create an new conda virtual environment
conda create -n fasternet python=3.9.12 -y
conda activate fasternet
Clone this repo and install required packages:
git clone https://github.com/JierunChen/FasterNet
pip install -r requirements.txt
Download the ImageNet-1K classification dataset and structure the data as follows:
/path/to/imagenet-1k/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
name | resolution | acc | #params | FLOPs | model |
---|---|---|---|---|---|
FasterNet-T0 | 224x224 | 71.9 | 3.9M | 0.34G | model |
FasterNet-T1 | 224x224 | 76.2 | 7.6M | 0.85G | model |
FasterNet-T2 | 224x224 | 78.9 | 15.0M | 1.90G | model |
FasterNet-S | 224x224 | 81.3 | 31.1M | 4.55G | model |
FasterNet-M | 224x224 | 83.0 | 53.5M | 8.72G | model |
FasterNet-L | 224x224 | 83.5 | 93.4M | 15.49G | model |
We give an example evaluation command for a ImageNet-1K pre-trained FasterNet-T0 on a single GPU:
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 125
- For evaluating other model variants, change
-c
,--checkpoint_path
accordingly. You can get the pre-trained models from the tables above. - For multi-GPU evaluation, change
-g
to a larger number or a list, e.g.,8
or0,1,2,3,4,5,6,7
. Note that the batch size for evaluation should be changed accordingly, e.g., change-e
from125
to1000
.
To measure the latency on CPU/ARM and throughput on GPU (if any), run
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 32 --measure_latency --fuse_conv_bn
-e
controls the batch size of input on GPU while the batch size of input is fixed internally to 1 on CPU/ARM.
Note: There are two issues related to latency/throughput measurement in the paper v1. Although they do not affect the conclusion that PConv and FasterNet achieve higher accuracy-latency efficiency, we clarify that:
- PConv and FasterNet use
"slicing"
type for faster inference and latency/throughput measurement. However, it implicitly modifies the shortcut, making a computation inconsistency to using"split_cat"
. To fix that, we may- clone the input via
x = x.clone()
before applying partial convolution, but it introduces additional latency and can defeat the benefits of using"slicing"
over"split_cat"
. - move the shortcut after the PConv operator, which resolves the issue and is likely to maintain the effectiveness. Models modified are under retraining and will be released once finished.
- clone the input via
- Latency and throughput are measured after merging the BatchNorm into Conv for all models if applicable. Due to an implementation bug in the initial version, the bias term after merging is wrongly omitted. After fixing the issue, most of the models, including other works compared, will be a bit slower than the statistics reported in the paper v1. We will update the statistics soon.
FasterNet-T0 training on ImageNet-1K with a 8-GPU node:
python train_test.py -g 0,1,2,3,4,5,6,7 --num_nodes 1 -n 4 -b 4096 -e 2000 \
--data_dir ../../data/imagenet --pin_memory --wandb_project_name fasternet \
--model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_t0.yaml
To train other FasterNet variants, --cfg
need to be changed. You may also want to change the training batch size -b
.
This repository is built using the timm , poolformer, ConvNeXt and mmdetection repositories.
If you find this repository helpful, please consider citing:
@article{chen2023run,
title={Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks},
author={Chen, Jierun and Kao, Shiu-hong and He, Hao and Zhuo, Weipeng and Wen, Song and Lee, Chul-Ho and Chan, S-H Gary},
journal={arXiv preprint arXiv:2303.03667},
year={2023}
}