akamaster / pytorch_resnet_cifar10 Goto Github PK

Proper implementation of ResNet-s for CIFAR10/100 in pytorch that matches description of the original paper.

License: BSD 2-Clause "Simplified" License

Python 98.33% Shell 1.67%

resnet resnet20 resnet32 resnet44 resnet56 resnet110 resnet1202 pytorch resnet-cifar cifar torchvision-models-cifar

pytorch_resnet_cifar10's Introduction

Proper ResNet Implementation for CIFAR10/CIFAR100 in Pytorch

Torchvision model zoo provides number of implementations of various state-of-the-art architectures, however, most of them are defined and implemented for ImageNet. Usually it is straightforward to use the provided models on other datasets, but some cases require manual setup.

For instance, very few pytorch repositories with ResNets on CIFAR10 provides the implementation as described in the original paper. If you just use the torchvision's models on CIFAR10 you'll get the model that differs in number of layers and parameters. This is unacceptable if you want to directly compare ResNet-s on CIFAR10 with the original paper. The purpose of this repo is to provide a valid pytorch implementation of ResNet-s for CIFAR10 as described in the original paper. The following models are provided:

Name	# layers	# params	Test err(paper)	Test err(this impl.)
ResNet20	20	0.27M	8.75%	8.27%
ResNet32	32	0.46M	7.51%	7.37%
ResNet44	44	0.66M	7.17%	6.90%
ResNet56	56	0.85M	6.97%	6.61%
ResNet110	110	1.7M	6.43%	6.32%
ResNet1202	1202	19.4M	7.93%	6.18%

This implementation matches description of the original paper, with comparable or better test error.

How to run?

git clone https://github.com/akamaster/pytorch_resnet_cifar10
cd pytorch_resnet_cifar10
chmod +x run.sh && ./run.sh

Details of training

Our implementation follows the paper in straightforward manner with some caveats: First, the training in the paper uses 45k/5k train/validation split on the train data, and selects the best performing model based on the performance on the validation set. We do not perform validation testing; if you need to compare your results on ResNet head-to-head to the orginal paper keep this in mind. Second, if you want to train ResNet1202 keep in mind that you need 16GB memory on GPU.

Pretrained models for download

If you find this implementation useful and want to cite/mention this page, here is a bibtex citation:

@misc{Idelbayev18a,
  author       = "Yerlan Idelbayev",
  title        = "Proper {ResNet} Implementation for {CIFAR10/CIFAR100} in {PyTorch}",
  howpublished = "\url{https://github.com/akamaster/pytorch_resnet_cifar10}",
  note         = "Accessed: 20xx-xx-xx"
}

pytorch_resnet_cifar10's People

Contributors

Stargazers

Watchers

Forkers

briando2005 chao1224 animeshkoratana sakurafan andrehuang tongm123 niluanwudidadi yuchaoli klpek deepercs nsl2014fm pugongyingizh9395 lornezhao yding5 juliansp sankin1770 chenbohua3 knimet wzn0828 haolibai xuchen86 chen94yue zhaohui-yang xufana7 xiaoyuwang2821 baogiadoan onion1003 lianjingxiang softwaregift jiafeng5513 chantalmp colaalex111 zhang405744522 txiang2 xudangliatiger 666dzy666 zhaoming0018 moran232 yaodongyu matln mil-ad wanghan0501 mfx12138 bartashevich-igor kensasongko yunfei-teng yuzhijun2 logichen jiangyx1028 pawopawo wanshanhsieh raneee yanzhaowu cxy1996 bigwater01 chenc10 ppplang tym2103 simon5u liuzhian alexrenda liyueqiao vinbhaskara caotingting123 lzhbrian wang93 gauravtanwar03 yangzhenhau marisakirisame janetwise prateekm08 juyongjiang 1157942086 elouayas tbachlechner yizhuami zytx121 guoyongcs dansun18 fabio-deep frederic-jurie jasonjjl nblt abduallahmohamed salimmj a1noack francescoconti enalisnick mistcarryyou vero1925 saurabhdash salarim mark-twain-zzk lrh2000 goel96vibhor lihao-ms zhangxuemiao wassryan deanplayerljx x-zho14

pytorch_resnet_cifar10's Issues

Could you add hash tag in your checkpoint name?

Thank you for your great work!
However, the checkpoint file can't be achieved by model_zoo.load_url provided by pytorch due to the illegal file name.
Here's the detail: https://pytorch.org/docs/stable/model_zoo.html?highlight=load_url#torch.utils.model_zoo.load_url
Would you please change your checkpoint name so that we can get it via model_zoo.load_url?

could you provide some explanation for the difference between CIFAR10 and ImageNet implementation in README?

thx

What version of PyTorch was used?

I tried running training with pytorch 1.0.0 and I got the following error, which I suspect is due to a runtime version mismatch. Which version of pytorch was this code developed under?

Traceback (most recent call last):
  File "trainer.py", line 303, in <module>
    main()
  File "trainer.py", line 134, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "trainer.py", line 183, in train
    output = model(input_var)
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/localtmp/mp3t/Projects/doggett/pytorch_resnet_cifar10/resnet.py", line 110, in forward
    out = F.relu(self.bn1(self.conv1(x)))
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/localtmp/mp3t/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCGeneral.cpp:405

an issue about resnet implementation

In your resnet.py file, the forward function of BasicBlock calss returns the relu of the sum of shortcut and the out variable, but shouldn't it return the sum of the shortcut and relu of the out variable? something like this :
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
out += self.shortcut(x)
return out

About learning rate

Greetings, I found in your code that the starting learning rage was set to be 0.1, which is too large for the resnet to improve, and in the first 100 epochs the accuracy of the model seemed to be 10%. Is this designed on purpose?

Shortcuts doesn't follow the original paper.

According to the paper:

We can also use a square matrix Ws in Eqn.(1). But we will show by experiments that the identity mapping is sufficient for addressing the degradation problem and is economical, and thus Ws is only used when matching dimensions.

In this implementation, the shortcut is only added if the dimensions are unmatched, and based on the option A or B it's decided whether to use an identity shortcut or a projection matrix.

pytorch_resnet_cifar10/resnet.py

Lines 65 to 76 in d5489e8

 if stride != 1 or in_planes != planes: 

 if option == 'A': 

 """ 

  For CIFAR10 ResNet paper uses option A. 

  """ 

 self.shortcut = LambdaLayer(lambda x: 

 F.pad(x[:, :, ::2, ::2], (0, 0, 0, 0, planes//4, planes//4), "constant", 0)) 

 elif option == 'B': 

 self.shortcut = nn.Sequential( 

 nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False), 

 nn.BatchNorm2d(self.expansion * planes) 

 )

However, the correct implementation is to add the identity layer if it's option A and the dimensions are matched, or to add a projection matrix otherwise.

can't download the pretrained models

Hello! Your project is very useful!
But I can't download the pretrained models that the page is 404. So can you refresh the link? Thank you very much!

about the mean and stddev

Hi, I compute the mean and stddev from the cifar10 train dataset. The results are:
mean:[125.30691805 122.95039414 113.86538318]
stddev:[62.99321928 62.08870764 66.70489964]
the normalized mean and stddev is:
mean: [0.49139968 0.48215841 0.44653091]
stddev: [0.24703223 0.24348513 0.26158784]

However, in the code, the mean and stddev is:
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])

an issue about resnet implementation

Dear professor,I have two questions.
1、Why it is line 81: out += self.shortcut(x) but not out += x

2、For ImageNet/iNaturalist ResNet paper uses option 'B'.? For cifar10/cifar100 uses option 'A'.
Not sure if I am understanding this right.

Look forward to your kind advic.

Loss output

Hi,
can you also report the loss you got on the trained model?
Thanks a lot

Validation on the test set

In the code the validation is performed on the test. This methodology is pathological and does not meet the paper statement on (45000, 5000) train/val split.

pytorch_resnet_cifar10/trainer.py

Line 101 in d5489e8

val_loader = torch.utils.data.DataLoader(

Issues on loading pre-trained model

Thank you for providing those very useful pre-trained models. However, I got some troubles when loading them. What I did are listed as follows,

res20 = resnet20()
weights = torch.load('pytorch_resnet_cifar10/pretrained_models/resnet20.th')
res20.load_state_dict(weights)

It fails because the keys are not matching, e.g., "conv1.weight" in the constructed model while "module.conv1.weight" in the pre-trained weights.

So I'm wondering is it possible to provide an example code for loading the pre-trained model? Or how can I solve this problem? Thanks.

why use He' method for initializing the linear layer weights?

Learning rate for ResNet 110

I am trying to train resnet110. After the first epoch, the learning rate does not return to .1, and it stays at .01. I believe this is not intended.

Epochs chosen different than the paper

Hi @akamaster,

The train set has 45.000 images.
Taking into account that the BS = 128, that would yield to 352 iterations / epochs.
In the paper they train the network for 64000 iterations, which results in 181 epochs of training.

Please, let me know if you agree

Adding LICENSE

Could you add LICENSE (e.g. MIT) to help people better benefit from your work?
Thank you.

How to save the model?

Hi,

When I set the BasicBlock's option as "A" and save the model with following code:
torch.save(model, 'test.pkl'),
here is the error:
"AttributeError: Can't pickle local object 'BasicBlock.init..'"

What's problem?

Thanks!

Is resnet56 getting overfit to achieve 93% accuracy? Could you please provide accuracy and loss graph for resnet56?

When I trained resnet 56 on cifar10 and plotted the graphs. It's getting overfit as shown below.

We need correct checkpoints

Hi, @akamaster! Great work!

Actually, for comparing of results we need to train ResNet's on 45k (train), choose best on 5k (validation) and write as result accuracy on 10k (test). Your better results mean that you slightly "overfitting" on test, because you chose best model by test and it is not correctly at all. So, if it is possible then re-train models on 45k/5k.

failed to load the model

UnpicklingError: invalid load key, '\x0a'.

Please include CIFAR100 results and pretrained models

Please include CIFAR100 results and pretrained models.
Thanks.

How to load a pretrained model?

It seems that although there is a flag for pre-trained model in the trainer.py, but it is not used to load the model and the training proceeds from scratch.
Note: I ended up loading it using the load checkpoint function.

How to load TRAINED model

Hello,

Thank you very much for this repo. I'm looking for trained model parameters, do you happen to have it for resnet20 and resnet32?

Thank you.

pretrained models

Thanks for your code and pretrained models.
But it seems that your pretrained models are not fully saved. When load your pretrained model, we found that the shortcut was not saved in the model. So could you please update your pretrained model?

Thanks a lot.

Running Evaluation on pretrained model

While running evaluate mode directly through command line ">python trainer.py --arch resnet20 --pretrained --evaluate --print-freq 5
Files already downloaded and verified
Test: [0/79] Time 13.405 (13.405) Loss 55.9279 (55.9279) Prec@1 6.250 (6.250)
Test: [5/79] Time 0.006 (2.239) Loss 48.1283 (57.7435) Prec@1 3.906 (5.599)
Test: [10/79] Time 0.005 (1.224) Loss 56.3518 (57.2674) Prec@1 5.469 (6.250)
Test: [15/79] Time 0.006 (0.843) Loss 54.6870 (57.2780) Prec@1 7.031 (6.201)
Test: [20/79] Time 0.006 (0.644) Loss 60.2548 (57.5978) Prec@1 7.031 (5.952)
Test: [25/79] Time 0.006 (0.521) Loss 58.7205 (57.5096) Prec@1 9.375 (6.160)
Test: [30/79] Time 0.006 (0.438) Loss 52.3959 (57.5041) Prec@1 7.031 (6.174)
Test: [35/79] Time 0.006 (0.378) Loss 54.9284 (57.7886) Prec@1 6.250 (6.033)
Test: [40/79] Time 0.005 (0.333) Loss 60.2110 (57.8033) Prec@1 6.250 (6.040)
Test: [45/79] Time 0.005 (0.297) Loss 57.6955 (57.9990) Prec@1 6.250 (6.063)
Test: [50/79] Time 0.004 (0.268) Loss 60.3893 (58.1968) Prec@1 5.469 (6.066)
Test: [55/79] Time 0.004 (0.245) Loss 66.5257 (58.4542) Prec@1 4.688 (5.957)
Test: [60/79] Time 0.006 (0.225) Loss 64.6813 (58.2924) Prec@1 7.812 (6.019)
Test: [65/79] Time 0.005 (0.209) Loss 51.3470 (57.9437) Prec@1 7.031 (6.120)
Test: [70/79] Time 0.006 (0.194) Loss 64.6101 (57.9421) Prec@1 8.594 (6.184)
Test: [75/79] Time 0.004 (0.182) Loss 60.6900 (57.9247) Prec@1 3.125 (6.075)"

for example the classification rate was not printing out. I tried with different models but still this issue remains

About the number of epochs?

How many epochs did u run in order to obtain the accuracy/error you claimed? I haven't found you have documented that anywhere. If you could explain your training policy a little bit more?

License File

Dear Akamaster,

Would you be able to add a License file to your code repository? This makes it easier to use for people in companies :)

I suggest an MIT license, which is the most permissive one can have :)

Regards,
Tijmen

Learning rate and optimizer

Could you provide the optimizer you were using for the training including learning rates and momentum, please? My test error only reaches 20% on resnet20.

test accuracy

Hi, I use your pretrained model (resnet56) to initialize the resnet56 model and predict the test set, and got this result. I think there is something that I did not use correctly but now I could not figure this out. I attached the code I used. Can you please check it? Thank you.

Test: [0/79] Time 5.170 (5.170) Loss 3.2016 (3.2016) Prec@1 11.719 (11.719)
Test: [50/79] Time 0.040 (0.145) Loss 3.1556 (3.2240) Prec@1 7.031 (10.003)

Prec@1 10.000

My testing code is as below.

parser.add_argument('--testmodel', default='./pretrained_models/resnet56-4bfd9763.th', type=str, metavar='TESTM',
help='path to test model')
def test():

global args, best_prec1
args = parser.parse_args()
# Check the save_dir exists or not
if not os.path.exists(args.save_dir):
    os.makedirs(args.save_dir)
model = resnet.__dict__[args.arch]()
model.load_state_dict(torch.load(args.testmodel), strict=False)
model = torch.nn.DataParallel(model)
#model = torch.nn.DataParallel(model, device_ids=GPUS).cuda()
model.cuda()

cudnn.benchmark = True

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

train_loader = torch.utils.data.DataLoader(
    datasets.CIFAR10(root='./data', train=True, transform=transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.RandomCrop(32, 4),
        transforms.ToTensor(),
        normalize,
    ]), download=True),
    batch_size=args.batch_size, shuffle=True,
    num_workers=args.workers, pin_memory=True)

val_loader = torch.utils.data.DataLoader(
    datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        normalize,
    ])),
    batch_size=128, shuffle=False,
    num_workers=args.workers, pin_memory=True)

# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda()


args.evaluate = True
if args.evaluate:
    validate(val_loader, model, criterion)
    return

any example on running it for CIFRA100 and ImageNet

Hi,

I was wondering is there's an example to run this on CIFAR100 and ImageNet?

About accuracy

Hello, thanks for your contribution. I run your code, but I can't obtain the result that you reported. Such as resnet20 in Cifar10, the accuracy is only 88.73. I hope you can give me some help.

Reproduce "test" accuracy

I'm having trouble reproducing the test accuracy that you quote in the Readme. In particular running your precise code for ResNet56 I get the best validation error rate of 7.36(16)%. This differs from your quoted value of 6.61% by 5 sigma. How exactly did you determine the quoted test accuracy of your model?

Failed to load pretrained model Resnet20

When I try to use the pretrained model, it returns the following.

RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1.running_mean", "layer2.0.bn1.running_var", "layer2.0.conv2.weight", "layer2.0.bn2.weight", "layer2.0.bn2.bias", "layer2.0.bn2.running_mean", "layer2.0.bn2.running_var", "layer2.1.conv1.weight", "layer2.1.bn1.weight", "layer2.1.bn1.bias", "layer2.1.bn1.running_mean", "layer2.1.bn1.running_var", "layer2.1.conv2.weight", "layer2.1.bn2.weight", "layer2.1.bn2.bias", "layer2.1.bn2.running_mean", "layer2.1.bn2.running_var", "layer2.2.conv1.weight", "layer2.2.bn1.weight", "layer2.2.bn1.bias", "layer2.2.bn1.running_mean", "layer2.2.bn1.running_var", "layer2.2.conv2.weight", "layer2.2.bn2.weight", "layer2.2.bn2.bias", "layer2.2.bn2.running_mean", "layer2.2.bn2.running_var", "layer3.0.conv1.weight", "layer3.0.bn1.weight", "layer3.0.bn1.bias", "layer3.0.bn1.running_mean", "layer3.0.bn1.running_var", "layer3.0.conv2.weight", "layer3.0.bn2.weight", "layer3.0.bn2.bias", "layer3.0.bn2.running_mean", "layer3.0.bn2.running_var", "layer3.1.conv1.weight", "layer3.1.bn1.weight", "layer3.1.bn1.bias", "layer3.1.bn1.running_mean", "layer3.1.bn1.running_var", "layer3.1.conv2.weight", "layer3.1.bn2.weight", "layer3.1.bn2.bias", "layer3.1.bn2.running_mean", "layer3.1.bn2.running_var", "layer3.2.conv1.weight", "layer3.2.bn1.weight", "layer3.2.bn1.bias", "layer3.2.bn1.running_mean", "layer3.2.bn1.running_var", "layer3.2.conv2.weight", "layer3.2.bn2.weight", "layer3.2.bn2.bias", "layer3.2.bn2.running_mean", "layer3.2.bn2.running_var", "linear.weight", "linear.bias".
Unexpected key(s) in state_dict: "best_prec1", "state_dict".

	if stride != 1 or in_planes != planes:
	if option == 'A':
	"""
	For CIFAR10 ResNet paper uses option A.
	"""
	self.shortcut = LambdaLayer(lambda x:
	F.pad(x[:, :, ::2, ::2], (0, 0, 0, 0, planes//4, planes//4), "constant", 0))
	elif option == 'B':
	self.shortcut = nn.Sequential(
	nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False),
	nn.BatchNorm2d(self.expansion * planes)
	)