facebookresearch / resnext Goto Github PK

Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

License: Other

Lua 100.00%

resnext's Introduction

ResNeXt: Aggregated Residual Transformations for Deep Neural Networks

By Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

UC San Diego, Facebook AI Research

Introduction
Citation
Requirements and Dependencies
Training
ImageNet Pretrained Models
Third-party re-implementations

News

Congrats to the ILSVRC 2017 classification challenge winner WMW. ResNeXt is the foundation of their new SENet architecture (a ResNeXt-152 (64 x 4d) with the Squeeze-and-Excitation module)!
Check out Figure 6 in the new Memory-Efficient Implementation of DenseNets paper for a comparision between ResNeXts and DenseNets. _{（DenseNet cosine is DenseNet trained with cosine learning rate schedule.）}

Introduction

This repository contains a Torch implementation for the ResNeXt algorithm for image classification. The code is based on fb.resnet.torch.

ResNeXt is a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width.

Figure: Training curves on ImageNet-1K. (Left): ResNet/ResNeXt-50 with the same complexity (~4.1 billion FLOPs, ~25 million parameters); (Right): ResNet/ResNeXt-101 with the same complexity (~7.8 billion FLOPs, ~44 million parameters).

Citation

If you use ResNeXt in your research, please cite the paper:

@article{Xie2016,
  title={Aggregated Residual Transformations for Deep Neural Networks},
  author={Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He},
  journal={arXiv preprint arXiv:1611.05431},
  year={2016}
}

Requirements and Dependencies

See the fb.resnet.torch installation instructions for a step-by-step guide.

Install Torch on a machine with CUDA GPU
Install cuDNN v4 or v5 and the Torch cuDNN bindings
Download the ImageNet dataset and move validation images to labeled subfolders

Training

Please follow fb.resnet.torch for the general usage of the code, including how to use pretrained ResNeXt models for your own task.

There are two new hyperparameters need to be specified to determine the bottleneck template:

-baseWidth and -cardinality

1x Complexity Configurations Reference Table

baseWidth	cardinality
64	1
40	2
24	4
14	8
4	32

To train ResNeXt-50 (32x4d) on 8 GPUs for ImageNet:

th main.lua -dataset imagenet -bottleneckType resnext_C -depth 50 -baseWidth 4 -cardinality 32 -batchSize 256 -nGPU 8 -nThreads 8 -shareGradInput true -data [imagenet-folder]

To reproduce CIFAR results (e.g. ResNeXt 16x64d for cifar10) on 8 GPUs:

th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 128 -nGPU 8 -nThreads 8 -shareGradInput true

To get comparable results using 2/4 GPUs, you should change the batch size and the corresponding learning rate:

th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 64 -nGPU 4 -LR 0.05 -nThreads 8 -shareGradInput true
th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-4 -batchSize 32 -nGPU 2 -LR 0.025 -nThreads 8 -shareGradInput true

Note: CIFAR datasets will be automatically downloaded and processed for the first time. Note that in the arXiv paper CIFAR results are based on pre-activated bottleneck blocks and a batch size of 256. We found that better CIFAR test acurracy can be achieved using original bottleneck blocks and a batch size of 128.

ImageNet Pretrained Models

ImageNet pretrained models are licensed under CC BY-NC 4.0.

Single-crop (224x224) validation error rate

Network	GFLOPS	Top-1 Error	Download
ResNet-50 (1x64d)	~4.1	23.9	Original ResNet-50
ResNeXt-50 (32x4d)	~4.1	22.2	Download (191MB)
ResNet-101 (1x64d)	~7.8	22.0	Original ResNet-101
ResNeXt-101 (32x4d)	~7.8	21.2	Download (338MB)
ResNeXt-101 (64x4d)	~15.6	20.4	Download (638MB)

Third-party re-implementations

Besides our torch implementation, we recommend to see also the following third-party re-implementations and extensions:

Training code in PyTorch code
Converting ImageNet pretrained model to PyTorch model and source. code
Training code in MXNet and pretrained ImageNet models code
Caffe prototxt, pretrained ImageNet models (with ResNeXt-152), curves code code

resnext's People

Contributors

Stargazers

Watchers

Forkers

saifrahmed surfcao fritexvz wanjinchang absorbguo guoshengcv phecy giserh learn-with-data stevenlol techscientist fredericksilva neo4reo brettll benjamesbabala talentedcomponent allensmile wuyuebupt mldl dousong zhunzhong07 satishjasthi zgsxwsdxg pilotbear jeanandhao hj3938 clcarwin strogo edwardchen123 daisenryaku zliangxu leezqcst codeaudit kietvo michealray cysu rlugojr ml-lab yangliuy hades210 chaffeechen zhanghaoinf zhangxujinsh pustar zzmjohn zilongzhong harendranathvegi9 hxl1990 dreadlord1984 yanweifu wsnpyo rongchangzhao shiyongde cwlseu sawon1234 yueli9 enderliuxiao solomon1588 phoenix104104 soralab nuanxinqing tonychouzju galcy cjsure shunfengdai s9xie leliaonvidia zhhezhhe xxxxxxxx-dl kebitmatf jonathanasdf liangzhangxd hayk-ghazaryan uptodiff hal2001 bearpaw milestonesvn jianweilin my777777 taihulight cypw chenbangfeng mohanarunachalam issac8huxley holyhao midasc freeyawork wang-mengjiao khanimar opencvfun runngezhang zhengfangwu lemonnight ubaidsayyed54 deepmusic 94mia yougoforward rotorliu yoelshoshan phenixi

resnext's Issues

How do you compute GFLOPs

Hi,
I'm trying to compute GFLOPs by using this code: https://github.com/apaszke/torch-opCounter
The input data is [1, 3, 224, 224], and the computed GFLOPs is about 22.84 GFLOPs for ResNeXt-50, 32x4d, which is different with the reported 4.1 GFLOPs.

So is there anything wrong with my approach? How do you compute GFLOPs?

Why don't ResNeXt use Pre activation?

May I ask why ResNeXt don't use Pre activation as mentioned in Identity Mappings in Deep Residual Networks. I didn't see the reason in Aggregated Residual Transformations for Deep Neural Networks.

caffe pretained model

Hi,can u provide the pretrained model of ResNeXt.The model link provided seems broken!

regarding training scripts for ssd-resnet-152 in caffe

Hi, where can we find the training scripts of ssd-resnet50, ssd-resnet-152 in caffe. I need that .
Kindly reply for this issue.
Thank yoyu

5k pretrained models

Is there any chance we can get your 5k way pretrained models? I suspect these will work much better for pretraining.

Question about style block

Hi,

In appendix "A. Implementation Details: CIFAR" of paper "Aggregated Residual Transformations for Deep Neural Networks", it is written that: "We adopt the pre-activation style block as in [14]", where [14] is the paper "Identity mappings in deep residual networks.". However, when I look at the resnext_bottleneck_B, I see it has the original style block with BN before the sum and ReLU after. Did you get your results with the original style block or with the pre-activation style block ?

the results on Cifar10

I trained ResNeXt-64x8 and ResNeXt-64x16 on Cifar-10. But the result is slight worse than the results released in the paper. I trained these models by three gpus:
th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16/8 -weightDecay 5e-4 -batchSize 128 -nGPU 3 -nThreads 8 -shareGradInput true
the best model results I got for each run is:
ResNeXt-64x8: 3.68%, 3.8%, 3.86% 3.90% 3.76% 3.84%
ResNeXt-64x16: 3.84% 3.60%
The average results in the paper is 3.65% and 3.58% for these two net.
Does someone have the same question? Any suggestions?
Thanks a lot.

About license

Hi, I have a question about license of Imagenet pre-trained model.
I want to do fine-tuning using Imagenet pre-trained model on my own dataset.
In this case, which license should I follow? Is it non-commercial?

The error value suddenly jumps to a giga number

Recently I am trying to reproduce your result in torch. And my command is

th main.lua -dataset cifar10 -bottleneckType resnext_C -depth 29 -baseWidth 64 -cardinality 16 -weightDecay 5e-6 -batchSize 32 -nGPU 2 -LR 0.025 -nThreads 8 -shareGradInput true | tee -a ./cifar10_2gpu_torch.log

I copied the command in README.md(CIFAR10 and 2GPUs).

The problem starts from #71 epoch.
Here is my log file(For readability I pick out part of them):

 * Finished epoch # 60     top1:   6.610  top5:   0.140
 * Finished epoch # 61     top1:   7.050  top5:   0.150
 * Finished epoch # 62     top1:   7.680  top5:   0.240
 * Finished epoch # 63     top1:   7.180  top5:   0.230
 * Finished epoch # 64     top1:   7.100  top5:   0.220
 * Finished epoch # 65     top1:   6.980  top5:   0.160
 * Finished epoch # 66     top1:   6.850  top5:   0.170
 * Finished epoch # 67     top1:   6.870  top5:   0.180
 * Finished epoch # 68     top1:   7.010  top5:   0.270
 * Finished epoch # 69     top1:   6.910  top5:   0.220
 * Finished epoch # 70     top1:   6.290  top5:   0.130
 * Finished epoch # 71     top1:  85.740  top5:  34.780
 * Finished epoch # 72     top1:  81.790  top5:  33.700
 * Finished epoch # 73     top1:  80.220  top5:  28.920
 * Finished epoch # 74     top1:  79.200  top5:  31.640
 * Finished epoch # 75     top1:  78.980  top5:  27.150
 * Finished epoch # 76     top1:  79.540  top5:  30.260
 * Finished epoch # 77     top1:  81.540  top5:  29.620

And the epoch output for a single batch:

 | Epoch: [78][1158/1563]    Time 1.024  Data 0.000  Err 1528913280.0000  top1  81.250  top5  28.125
 | Epoch: [78][1159/1563]    Time 0.881  Data 0.000  Err 1559899264.0000  top1  81.250  top5  15.625
 | Epoch: [78][1160/1563]    Time 0.975  Data 0.000  Err 8231911424.0000  top1  87.500  top5  40.625
 | Epoch: [78][1161/1563]    Time 0.928  Data 0.000  Err 554394944.0000  top1  78.125  top5  28.125
 | Epoch: [78][1162/1563]    Time 1.012  Data 0.000  Err 4567331328.0000  top1  93.750  top5  40.625
 | Epoch: [78][1163/1563]    Time 1.146  Data 0.000  Err 2310403584.0000  top1  78.125  top5  34.375
 | Epoch: [78][1164/1563]    Time 0.947  Data 0.000  Err 2803231744.0000  top1  81.250  top5  25.000
 | Epoch: [78][1165/1563]    Time 0.956  Data 0.000  Err 2265360896.0000  top1  87.500  top5  50.000
 | Epoch: [78][1166/1563]    Time 0.867  Data 0.000  Err 1953190016.0000  top1  84.375  top5  21.875
 | Epoch: [78][1167/1563]    Time 1.014  Data 0.000  Err 2912053760.0000  top1  93.750  top5  28.125
 | Epoch: [78][1168/1563]    Time 1.007  Data 0.000  Err 4222694656.0000  top1  84.375  top5  31.250
 | Epoch: [78][1169/1563]    Time 0.895  Data 0.000  Err 5509958144.0000  top1  81.250  top5  37.500
 | Epoch: [78][1170/1563]    Time 0.979  Data 0.000  Err 5301891584.0000  top1  84.375  top5  34.375
 | Epoch: [78][1171/1563]    Time 0.920  Data 0.000  Err 3593149184.0000  top1  87.500  top5  28.125
 | Epoch: [78][1172/1563]    Time 1.020  Data 0.000  Err 7279746560.0000  top1  90.625  top5  31.250
 | Epoch: [78][1173/1563]    Time 1.002  Data 0.000  Err 10108009472.0000  top1  87.500  top5  31.250
 | Epoch: [78][1174/1563]    Time 0.861  Data 0.001  Err 2861270528.0000  top1  87.500  top5  28.125
 | Epoch: [78][1175/1563]    Time 0.862  Data 0.000  Err 4651573760.0000  top1  87.500  top5  31.250
 | Epoch: [78][1176/1563]    Time 1.051  Data 0.000  Err 92108896.0000  top1  75.000  top5  31.250
 | Epoch: [78][1177/1563]    Time 1.024  Data 0.000  Err 2649925888.0000  top1  87.500  top5  43.750
 | Epoch: [78][1178/1563]    Time 0.967  Data 0.000  Err 2876758784.0000  top1  71.875  top5  18.750
 | Epoch: [78][1179/1563]    Time 0.942  Data 0.000  Err 2976156928.0000  top1  71.875  top5  15.625
 | Epoch: [78][1180/1563]    Time 0.882  Data 0.000  Err 838116416.0000  top1  78.125  top5  43.750
 | Epoch: [78][1181/1563]    Time 1.028  Data 0.000  Err 6477106688.0000  top1  78.125  top5  37.500
 | Epoch: [78][1182/1563]    Time 1.004  Data 0.000  Err 5051654144.0000  top1  84.375  top5  31.250
 | Epoch: [78][1183/1563]    Time 0.859  Data 0.000  Err 5013932544.0000  top1  87.500  top5  34.375
 | Epoch: [78][1184/1563]    Time 0.848  Data 0.001  Err 2034009088.0000  top1  93.750  top5  25.000
 | Epoch: [78][1185/1563]    Time 1.060  Data 0.000  Err 3669680640.0000  top1  78.125  top5  25.000
 | Epoch: [78][1186/1563]    Time 1.028  Data 0.000  Err 4146675200.0000  top1  93.750  top5  28.125
 | Epoch: [78][1187/1563]    Time 0.966  Data 0.000  Err 2259935488.0000  top1  84.375  top5  34.375
 | Epoch: [78][1188/1563]    Time 0.956  Data 0.000  Err 1698448512.0000  top1  75.000  top5  25.000
 | Epoch: [78][1189/1563]    Time 0.864  Data 0.000  Err 4151320064.0000  top1  90.625  top5  56.250
 | Epoch: [78][1190/1563]    Time 1.035  Data 0.000  Err 1942320000.0000  top1  87.500  top5  31.250
 | Epoch: [78][1191/1563]    Time 1.026  Data 0.000  Err 1455451520.0000  top1  81.250  top5  31.250
 | Epoch: [78][1192/1563]    Time 0.867  Data 0.000  Err 2734585856.0000  top1  90.625  top5  40.625
 | Epoch: [78][1193/1563]    Time 0.965  Data 0.000  Err 36324916.0000  top1  81.250  top5  18.750
 | Epoch: [78][1194/1563]    Time 0.913  Data 0.000  Err 6873055744.0000  top1  90.625  top5  50.000
 | Epoch: [78][1195/1563]    Time 1.004  Data 0.000  Err 1242362112.0000  top1  84.375  top5  31.250

So anyone has the same problem or how to solve it? I have run this for nearly two days, but it really disappointed me.

Thanks a lot.

Caffe third party repository.

Seems the Caffe third party repository no longer exists. Is there an alternative? I found https://github.com/firekong0909/ResNeXt but the download links appear to be broken.

How to cite your ResNeXt work?

Dear author:
very great work for deep learning, yet I want to konw how can I cite your work as bibtex? the given cite seems not complete, could you please update the cite content?

 thanks a lot.

@article{Xie2016, title={Aggregated Residual Transformations for Deep Neural Networks}, author={Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He}, journal={arXiv preprint arXiv:1611.05431}, year={2016} }

learning rate for 4 gpu on imagenet dataset

What lr should I choose for imagenet training if I use 4 GPUs instead of 4 with batch size of 32 per gpu?