weiaicunzai / pytorch-cifar100 Goto Github PK

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet, WideResNet)

Python 100.00%

pytorch image-classification deep-learning cifar100 resnet googlenet inceptionv4 xception resnext inceptionv3

pytorch-cifar100's People

Contributors

Stargazers

Watchers

Forkers

ddeeppnneett keltoskytoi jascy dungxnguyen qmiwang lionkunonly jolly12138 wang93 humaneric whudonggu gaow0007 seongkyun coderxdy bobby2090 opencvfun sanqiaiziji leon967 hawl666 nccao lxhwz lusonpan62678 niluanwudidadi alzayats wy3406 zgl1113 freedom99 123fengye741 qiuchumo hajungong007 sprinterzzj mingkin lvxiuwang ablewolf saintlogos1234 bigpo zhijl coolde terrencewayne suchaoxiao 666dzy666 shuxjweb mrku69 dezhili zhaobinnku liviuslw xiaoxifuhongse praveen94 zhangjunyi1225054736 crysflair abhimanyudubey lehahoang louischenki yulei1234 zhijiesun hsqzzpf muyoucun shaunzhuyw alexfrontxq wangtaospace cstephenson970 suchismitapadhy joannepiggy chenzhengdeeplearning mrwhitehomeman mengkunzhao grothendieck-ouc weifengou amandaluof nick1889 mrwupengfei cscn89 littlespray qingsong99 gxhrid jy00002 alexkoff88 mesalamon shuiyeyue littlepure2333 wang3702 wanggcong kwanegx jackeyli81 celljy hhanwenhui sunyaj ziwh mashrurmorshed wuzhenyubuaa xjy345 lipanr dream-in-night happyfee qingfengwuhen simonqunisa marswei greenfigo2015 syorami developer0hye xtanitfy

pytorch-cifar100's Issues

学习率是增加的？

raining Epoch: 1 [3968/8000] Loss: 1.3380 LR: 0.004960
Training Epoch: 1 [3976/8000] Loss: 0.3088 LR: 0.004970
Training Epoch: 1 [3984/8000] Loss: 0.6474 LR: 0.004980
Training Epoch: 1 [3992/8000] Loss: 0.4500 LR: 0.004990
Training Epoch: 1 [4000/8000] Loss: 0.6452 LR: 0.005000
Training Epoch: 1 [4008/8000] Loss: 0.9984 LR: 0.005010
Training Epoch: 1 [4016/8000] Loss: 0.7139 LR: 0.005020
Training Epoch: 1 [4024/8000] Loss: 0.6220 LR: 0.005030
Training Epoch: 1 [4032/8000] Loss: 0.4329 LR: 0.005040
Training Epoch: 1 [4040/8000] Loss: 0.4127 LR: 0.005050
Training Epoch: 1 [4048/8000] Loss: 0.4696 LR: 0.005060
Training Epoch: 1 [4056/8000] Loss: 0.5181 LR: 0.005070
Training Epoch: 1 [4064/8000] Loss: 0.4105 LR: 0.005080
Training Epoch: 1 [4072/8000] Loss: 0.7041 LR: 0.005090
Training Epoch: 1 [4080/8000] Loss: 0.3864 LR: 0.005100
Training Epoch: 1 [4088/8000] Loss: 0.6991 LR: 0.005110
Training Epoch: 1 [4096/8000] Loss: 0.3007 LR: 0.005120
Training Epoch: 1 [4104/8000] Loss: 0.3111 LR: 0.005130
Training Epoch: 1 [4112/8000] Loss: 0.3763 LR: 0.005140
Training Epoch: 1 [4120/8000] Loss: 0.5825 LR: 0.005150
Training Epoch: 1 [4128/8000] Loss: 0.5528 LR: 0.005160
Training Epoch: 1 [4136/8000] Loss: 0.3553 LR: 0.005170
Training Epoch: 1 [4144/8000] Loss: 0.2654 LR: 0.005180
Training Epoch: 1 [4152/8000] Loss: 0.3935 LR: 0.005190
Training Epoch: 1 [4160/8000] Loss: 0.2935 LR: 0.005200
Training Epoch: 1 [4168/8000] Loss: 0.2382 LR: 0.005210
Training Epoch: 1 [4176/8000] Loss: 0.2893 LR: 0.005220

GPU利用率

博主您好，请问我在训练时GPU利用率只能维持在10%左右，对应的batch_size是64，num_workers是8。请问有没有什么办法提高利用率呢？

tensorboardX path

Traceback (most recent call last):
File "D:\Anaconda\envs\PyTorch\lib\site-packages\tensorboardX\record_writer.py", line 47, in directory_check
factory = REGISTERED_FACTORIES[prefix]
KeyError: 'runs\vgg16\2020-03-08T16'

Memory computing problems

How to calculate the memory cost? The code doesn't seem to have been provided.
Thanks!

Error: for batch_index, (images, labels) in enumerate(cifar100_training_loader)

您好，我在运行您的程序时在train.py中train函数、第33行 for batch_index, (images, labels) in enumerate(cifar100_training_loader)报错如下：
发生异常: TypeError
Traceback (most recent call last):
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 140, in default_collate
raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'torchvision.transforms.Compose'>
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 33, in train
for batch_index, (images, labels) in enumerate(cifar100_training_loader):
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 160, in
train(epoch)
请问您能帮我解决吗？非常感谢，Nice work！

The accuracy of vgg16-bn and vgg19_bn

How do I change to my own data set

My data set has three categories in three folders. How can I change the code that reads the data

A little difference in shufflenetV2

In the original paper, there is a maxpool layer in 'pre' block between image and stage2 while I can't find it in this code.

CUDA out of memory problem

It seems some of the nets define in models has some hidden bug.
For example, I use senet and will get CUDA out of memory error, but my batch_size is only 64, my GPU memory is 11G。

But when I use the model file here
https://github.com/moskomule/senet.pytorch/tree/master/senet
that only occupy 7G memory when batch_size=90.

I find senet.py resnext.py inceptionv4.py both has similar problem，may be more models.

resnet50 结果

为什么结果比表格好很多没有修改，resnet50 183-best-pth
是我测试错了吗

boolean type can be accepted rightly as an argument in train.py

In the train.py, the argument gpu and s will be accepted as boolean. However, in normal ways, if you input the command like the following:
python train.py -gpu False -s False

you will find the arg.gpu and arg.s are True. It means that the arguments gpu and s do not accept the input rightly.

The function of WarmUpLR() in the utils.py?

thank for your great code.
I am confused of the function of WarmUpLR() in the utils.py? And the lr_finder.py?

Model Accuracy

i use your model "seresnet18" to train cifar-100 datasets, and i try your train Details, "200epoch , init lr = 0.1 divide by 5 at 60th, 120th, 160th epochs, train for 200 epochs with batchsize 128 and weight decay 5e-4, " but i cant get the 23.56 accuracty rate , i only get 33.01 accuracy rate!
can you help me ?

WinError123

[WinError 123] 文件名、目录名或卷标语法不正确。: 'runs\vgg16\2019-03-16T19:09:02.288156'

About the nums of filters of the preactreanet.py

In the 86 line of the 'pytorch-cifar100/models/preactresnet.py' file, as the under shows:

pytorch-cifar100/models/preactresnet.py

Line 86 in 1f02733

self.stage4 = self._make_layers(block, num_block[3], 516, 2)

I think the 156 filters, here, should be the 512.

accuracy of shufflenet v2

I tried to train shufflenet v2 with your scipt with default hyper parameters you set.
It should yield 69.51 % of best test accuracy but i got best test accuracy 61.68%. So what hyper parameters did you use??

RuntimeError: /pytorch/torch/csrc/jit/fuser/cuda/fused_kernel.cpp:196: NVRTC_ERROR unknown

When I use googlenet ,vgg, it worked normally.
But when I use resnet ,the error occured:
RuntimeError: /pytorch/torch/csrc/jit/fuser/cuda/fused_kernel.cpp:196: NVRTC_ERROR unknown

这训练结果你是取得200个epoch中最高纪录还是第200个epoch的结果？

另外，谢谢老哥提供的pytorch学习资料

Unable to execute code on test set

Hi,

I am trying to test the model on Google Colab using the command you have put up in your README.md file ( !python test.py -net vgg16 -weights ./checkpoint/vgg16/Tuesday_04_August_2020_15h_05m_17s/vgg16-171-best.pth). However, I seem to be running into this issue

To my understanding, this issue only comes up when the model is on the GPU but data is on the CPU. I checked the code to see what the issue could be but found that not only is the model being loaded to the Colab GPU, but also the labels and images. I'm not sure if this problem arises due to it being executed on Google Colab (I do not have access to a GPU locally). Would appreciate your help regarding this.

Model estimates criterion

I have noticed that you use the full training set to train the model and select the best model directly on the test set? This practice will not underestimate the test error ? As I often split a validation set and choose the model based on the performance on the validation set. Thanks in advance

print(torch.cuda.memory_summary(), end='')

AttributeError: module torch.cuda has no attribute memory_summary

What the meaning of "torch.cuda.memory_summary()" ??
What does the author want to express in this sentence “print(torch.cuda.memory_summary(), end='')”
train.py line 100

新手小白想问一下那个the weights file you want to test是指的训好网络的checkpoint文件吗？

TypeError: forward() takes 2 positional arguments but 13 were given

when i run train.py, there is something wrong:
TypeError: forward() takes 2 positional arguments but 13 were given
'writer.add_graph(net, Variable(input_tensor, requires_grad=True))'

So , you know how to deal?

No softmax layer

Is there no softmax layer for all models? Will this problem influence the test acc?

Multiple GPUs

Great thanks for sharing the codes!

One question, does the training support multiple GPUs? Kindly correct if I am mistaken, but as far as I can tell, the codes only support single GPU now?

The histogram is empty

I am doing a classification task where I have changed the CIFAR dataset to my custom dataset. When I started the first epoch, the loss value is too big and after some iteration, the loss becomes "nan". After completing 1 epoch, the program is crashed.

Traceback (most recent call last):
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 213, in <module>
    train(epoch)
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 71, in train
    writer.add_histogram("{}/{}".format(layer, attr), param, epoch)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 425, in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 226, in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 264, in make_histogram
    raise ValueError('The histogram is empty, please file a bug report.')
ValueError: The histogram is empty, please file a bug report.

resnet50 accuracy is a little bit worse than resnet18

With three runs of resnet18, I got an average acc around 76.1. However, on resnet50, I got 76.0.
Does anyone have the same problem? By the way, resnet101 work fine with acc 78.1. I think the resnet50's acc is supposed to be around 77.

my own dataset

can it train my own dataset, if can ,how can,please!

Densenet: wrong structure of transition layer

According to original densenet implementation, the transition layer should be BN-ReLU-Conv-Pool, but the code in this repository is BN-Conv-Pool. BN-ReLU is missing, which may hurt the accuracy of the model.

the densenet (from paper author):
https://github.com/liuzhuang13/DenseNet/blob/cf511e4add35a7d7a921901101ce7fa8f704aee2/models/densenet.lua#L37-L52

this repo:

pytorch-cifar100/models/densenet.py

Lines 47 to 58 in 2149cb5

 class Transition(nn.Module): 

 def __init__(self, in_channels, out_channels): 

 super().__init__() 

 #"""The transition layers used in our experiments 

 #consist of a batch normalization layer and an 1×1 

 #convolutional layer followed by a 2×2 average pooling 

 #layer""". 

 self.down_sample = nn.Sequential( 

 nn.BatchNorm2d(in_channels), 

 nn.Conv2d(in_channels, out_channels, 1, bias=False), 

 nn.AvgPool2d(2, stride=2) 

 )

by the way, maybe the description in the paper is misleading:

The transitionlayers used in our experiments consist of a batch normal-ization layer and an 1×1 convolutional layer followed by a2×2 average pooling layer.

Mistakes of MobileNetV2

Dear sir，I found a mistake when I use the MobileNetV2 model. You can print the architecture and check it. The fisrt layer has a output of [32,226] , which expects [32,112] . I'd like to fix this error .

shouldn't group in_channels inside 1x1 conv

nn.Conv2d(in_channels, C * D, kernel_size=1, groups=C, bias=False), doesn't make sense.

我想知道为什么我跑了很多次但是结果都远好于作者给出的表格里的结果是不是我测试错了 resnet50

Maybe some wrong in Xception

Hi,thanks for your nice job!

when i study the xception,i reading the paper and compare your implementation.
in your model/xception/SeperableConv2d, i get some confusion about:

in the paper,the author tell us that xception first through the 1x1 conv, but in your imple, you using depthwise first which is not 1x1 conv right?

    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)

        return x

any pretrained models for downloading?

Is the data enhanced once or every epoch?

Is the data enhanced once or every epoch? I want to know if the data augmentation function "transform_train" executed once or every epoch. If the data augmentation only executed once, that make little difference.

lower accuracy

Thanks for your great work! I use mobilenet to train the model, warm is set to 2 or 1, but the top1 acc is only 0.5747, which is about 9% difference from yours. Do you know what is wrong? My pytorch version is 1.2, will this affect? Looking for your reply!

#用自己的训练集

文件形式是什么样的？一致报错Permission denied: 'F:\rock_image\shierlei_kc\train'

Hi Can't find model weights after training. What is the default weights path after training.

Pretrained Models

Are your pre-trained weights available to download for any of your experimental runs?

Is there any problem with your model implementation？Many model such as shufflenetmobilenet will eventually downsample to 7x7, but some model implementations are downsampled to 28x28

请问是不是models.resnet这些网络的实现有些问题？

我的pytorch==1.2, torchvision=0.4.0
我试着用torchvision.models中实现的ResNet去运行train.py训练，结果在测试集上的ACC只有 60% 左右。我用作者你实现的ResNet训练的话，ACC大概符合预期。
二者区别还有一个，Batch_size=128时，用torchvision.models中的网络训练，GPU_Memory大概只用了1.4G，而用作者你的实现，GPU_Memory大概要占用5G左右，感觉像是给Batch-Size中每个图片都分配了一个model。
请问这个实现网络的时候有区别，还是咱们的pytorch版本不同导致的问题？
谢谢。

Links at README.md

Thank you for the code.
Please note that all three links under the paragraph "Training Details" refer to the same paper.

Training stage

Do you train the model from scratch? or based on pre-trained weights on ImageNet

vgg.py

nn.Linear(512, 4096)
nn.Linear(512 * 7 * 7, 4096)

Is this a mistake in senet for fair comparison?

I notice that the stage4 of other instance of ResNet is 512 while the SENet is 516,
Is this a bug?
SENet:

pytorch-cifar100/models/senet.py

Line 126 in 1fe2ec4

self.stage4 = self._make_stage(block, block_num[3], 516, 2)

ResNet:

pytorch-cifar100/models/resnet.py

Line 96 in 1fe2ec4

self.conv5_x = self._make_layer(block, 512, num_block[3], 2)

ReXsNet:

pytorch-cifar100/models/resnext.py

Line 80 in 1fe2ec4

self.conv5 = self._make_layer(block, num_blocks[3], 512, 2)

ImportError: cannot import name 'OperatorExportTypes

Hello, I encountered this problem ("ImportError: cannot import name 'OperatorExportTypes'") when running, my data set was downloaded in advance. Ask for guidance online, thank you.

shufflenet的模型定义这里是不是不太对？

pytorch-cifar100/models/shufflenet.py

Line 98 in 2149cb5

groups=groups

根据shufflenet的论文，shufflenet第二级的首个bottleneck层不应该是分组卷积，但是从这里的代码看，作者应该是分组了。是不是不太对？

Awesome work! Can you share your trained parameters for the neural network by the way?

Tensor sizes mismatches

Hi,

I am implementing ResNet18 models using the CIFAR-100 dataset.

I checked all the dimensions but got this error:

RuntimeError: The size of tensor a (100) must match the size of tensor b (32) at non-singleton dimension 3

Can you please tell me how to fix it?

Thanks

model better than torchvision model on cifar100

Hi,

I thank you for your repo.
I tried to compare the accuracy of your resnet18 with the torchvision one, I do not understand why without pretrained, your model gives 70+ accuracy but the torchvision only 55. Have you implemented something specific for the cifar dataset ? (cifar100)

	class Transition(nn.Module):
	def __init__(self, in_channels, out_channels):
	super().__init__()
	#"""The transition layers used in our experiments
	#consist of a batch normalization layer and an 1×1
	#convolutional layer followed by a 2×2 average pooling
	#layer""".
	self.down_sample = nn.Sequential(
	nn.BatchNorm2d(in_channels),
	nn.Conv2d(in_channels, out_channels, 1, bias=False),
	nn.AvgPool2d(2, stride=2)
	)

weiaicunzai / pytorch-cifar100 Goto Github PK

pytorch-cifar100's People

Contributors

Stargazers

Watchers

Forkers

pytorch-cifar100's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs