GithubHelp home page GithubHelp logo

weiaicunzai / pytorch-cifar100 Goto Github PK

View Code? Open in Web Editor NEW
4.1K 35.0 1.1K 543 KB

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet, WideResNet)

Python 100.00%
pytorch image-classification deep-learning cifar100 resnet googlenet inceptionv4 xception resnext inceptionv3

pytorch-cifar100's People

Contributors

developer0hye avatar weiaicunzai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-cifar100's Issues

学习率是增加的?

raining Epoch: 1 [3968/8000] Loss: 1.3380 LR: 0.004960
Training Epoch: 1 [3976/8000] Loss: 0.3088 LR: 0.004970
Training Epoch: 1 [3984/8000] Loss: 0.6474 LR: 0.004980
Training Epoch: 1 [3992/8000] Loss: 0.4500 LR: 0.004990
Training Epoch: 1 [4000/8000] Loss: 0.6452 LR: 0.005000
Training Epoch: 1 [4008/8000] Loss: 0.9984 LR: 0.005010
Training Epoch: 1 [4016/8000] Loss: 0.7139 LR: 0.005020
Training Epoch: 1 [4024/8000] Loss: 0.6220 LR: 0.005030
Training Epoch: 1 [4032/8000] Loss: 0.4329 LR: 0.005040
Training Epoch: 1 [4040/8000] Loss: 0.4127 LR: 0.005050
Training Epoch: 1 [4048/8000] Loss: 0.4696 LR: 0.005060
Training Epoch: 1 [4056/8000] Loss: 0.5181 LR: 0.005070
Training Epoch: 1 [4064/8000] Loss: 0.4105 LR: 0.005080
Training Epoch: 1 [4072/8000] Loss: 0.7041 LR: 0.005090
Training Epoch: 1 [4080/8000] Loss: 0.3864 LR: 0.005100
Training Epoch: 1 [4088/8000] Loss: 0.6991 LR: 0.005110
Training Epoch: 1 [4096/8000] Loss: 0.3007 LR: 0.005120
Training Epoch: 1 [4104/8000] Loss: 0.3111 LR: 0.005130
Training Epoch: 1 [4112/8000] Loss: 0.3763 LR: 0.005140
Training Epoch: 1 [4120/8000] Loss: 0.5825 LR: 0.005150
Training Epoch: 1 [4128/8000] Loss: 0.5528 LR: 0.005160
Training Epoch: 1 [4136/8000] Loss: 0.3553 LR: 0.005170
Training Epoch: 1 [4144/8000] Loss: 0.2654 LR: 0.005180
Training Epoch: 1 [4152/8000] Loss: 0.3935 LR: 0.005190
Training Epoch: 1 [4160/8000] Loss: 0.2935 LR: 0.005200
Training Epoch: 1 [4168/8000] Loss: 0.2382 LR: 0.005210
Training Epoch: 1 [4176/8000] Loss: 0.2893 LR: 0.005220

GPU利用率

博主您好,请问我在训练时GPU利用率只能维持在10%左右,对应的batch_size是64,num_workers是8。请问有没有什么办法提高利用率呢?

tensorboardX path

Traceback (most recent call last):
File "D:\Anaconda\envs\PyTorch\lib\site-packages\tensorboardX\record_writer.py", line 47, in directory_check
factory = REGISTERED_FACTORIES[prefix]
KeyError: 'runs\vgg16\2020-03-08T16'

Error: for batch_index, (images, labels) in enumerate(cifar100_training_loader)

您好,我在运行您的程序时在train.py中train函数、第33行 for batch_index, (images, labels) in enumerate(cifar100_training_loader)报错如下:
发生异常: TypeError
Traceback (most recent call last):
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 140, in default_collate
raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'torchvision.transforms.Compose'>
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 33, in train
for batch_index, (images, labels) in enumerate(cifar100_training_loader):
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 160, in
train(epoch)
请问您能帮我解决吗?非常感谢,Nice work!

CUDA out of memory problem

It seems some of the nets define in models has some hidden bug.
For example, I use senet and will get CUDA out of memory error, but my batch_size is only 64, my GPU memory is 11G。

But when I use the model file here
https://github.com/moskomule/senet.pytorch/tree/master/senet
that only occupy 7G memory when batch_size=90.

I find senet.py resnext.py inceptionv4.py both has similar problem,may be more models.

resnet50 结果

B`FE{7{WSIV WBMAJ@D0ARV
为什么结果比表格好很多 没有修改 ,resnet50 183-best-pth
是我测试错了吗

boolean type can be accepted rightly as an argument in train.py

In the train.py, the argument gpu and s will be accepted as boolean. However, in normal ways, if you input the command like the following:
python train.py -gpu False -s False

you will find the arg.gpu and arg.s are True. It means that the arguments gpu and s do not accept the input rightly.

Model Accuracy

i use your model "seresnet18" to train cifar-100 datasets, and i try your train Details, "200epoch , init lr = 0.1 divide by 5 at 60th, 120th, 160th epochs, train for 200 epochs with batchsize 128 and weight decay 5e-4, " but i cant get the 23.56 accuracty rate , i only get 33.01 accuracy rate!
can you help me ?

WinError123

[WinError 123] 文件名、目录名或卷标语法不正确。: 'runs\vgg16\2019-03-16T19:09:02.288156'

accuracy of shufflenet v2

I tried to train shufflenet v2 with your scipt with default hyper parameters you set.
It should yield 69.51 % of best test accuracy but i got best test accuracy 61.68%. So what hyper parameters did you use??

Unable to execute code on test set

Hi,

I am trying to test the model on Google Colab using the command you have put up in your README.md file ( !python test.py -net vgg16 -weights ./checkpoint/vgg16/Tuesday_04_August_2020_15h_05m_17s/vgg16-171-best.pth). However, I seem to be running into this issue

Screenshot 2020-08-04 at 18 07 46

To my understanding, this issue only comes up when the model is on the GPU but data is on the CPU. I checked the code to see what the issue could be but found that not only is the model being loaded to the Colab GPU, but also the labels and images. I'm not sure if this problem arises due to it being executed on Google Colab (I do not have access to a GPU locally). Would appreciate your help regarding this.

Model estimates criterion

I have noticed that you use the full training set to train the model and select the best model directly on the test set? This practice will not underestimate the test error ? As I often split a validation set and choose the model based on the performance on the validation set. Thanks in advance

print(torch.cuda.memory_summary(), end='')

AttributeError: module torch.cuda has no attribute memory_summary
image

What the meaning of "torch.cuda.memory_summary()" ??
What does the author want to express in this sentence “print(torch.cuda.memory_summary(), end='')”
train.py line 100

No softmax layer

Is there no softmax layer for all models? Will this problem influence the test acc?

Multiple GPUs

Great thanks for sharing the codes!

One question, does the training support multiple GPUs? Kindly correct if I am mistaken, but as far as I can tell, the codes only support single GPU now?

The histogram is empty

I am doing a classification task where I have changed the CIFAR dataset to my custom dataset. When I started the first epoch, the loss value is too big and after some iteration, the loss becomes "nan". After completing 1 epoch, the program is crashed.

Traceback (most recent call last):
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 213, in <module>
    train(epoch)
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 71, in train
    writer.add_histogram("{}/{}".format(layer, attr), param, epoch)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 425, in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 226, in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 264, in make_histogram
    raise ValueError('The histogram is empty, please file a bug report.')
ValueError: The histogram is empty, please file a bug report.

resnet50 accuracy is a little bit worse than resnet18

With three runs of resnet18, I got an average acc around 76.1. However, on resnet50, I got 76.0.
Does anyone have the same problem? By the way, resnet101 work fine with acc 78.1. I think the resnet50's acc is supposed to be around 77.

Densenet: wrong structure of transition layer

According to original densenet implementation, the transition layer should be BN-ReLU-Conv-Pool, but the code in this repository is BN-Conv-Pool. BN-ReLU is missing, which may hurt the accuracy of the model.

the densenet (from paper author):
https://github.com/liuzhuang13/DenseNet/blob/cf511e4add35a7d7a921901101ce7fa8f704aee2/models/densenet.lua#L37-L52

this repo:

class Transition(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
#"""The transition layers used in our experiments
#consist of a batch normalization layer and an 1×1
#convolutional layer followed by a 2×2 average pooling
#layer""".
self.down_sample = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.Conv2d(in_channels, out_channels, 1, bias=False),
nn.AvgPool2d(2, stride=2)
)

by the way, maybe the description in the paper is misleading:

The transitionlayers used in our experiments consist of a batch normal-ization layer and an 1×1 convolutional layer followed by a2×2 average pooling layer.

Mistakes of MobileNetV2

Dear sir,I found a mistake when I use the MobileNetV2 model. You can print the architecture and check it. The fisrt layer has a output of [32,226] , which expects [32,112] . I'd like to fix this error .

Maybe some wrong in Xception

Hi,thanks for your nice job!

when i study the xception,i reading the paper and compare your implementation.
in your model/xception/SeperableConv2d, i get some confusion about:

in the paper,the author tell us that xception first through the 1x1 conv, but in your imple, you using depthwise first which is not 1x1 conv right?
image

    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)

        return x

Is the data enhanced once or every epoch?

Is the data enhanced once or every epoch? I want to know if the data augmentation function "transform_train" executed once or every epoch. If the data augmentation only executed once, that make little difference.

lower accuracy

Thanks for your great work! I use mobilenet to train the model, warm is set to 2 or 1, but the top1 acc is only 0.5747, which is about 9% difference from yours. Do you know what is wrong? My pytorch version is 1.2, will this affect? Looking for your reply!

#用自己的训练集

文件形式是什么样的?一致报错Permission denied: 'F:\rock_image\shierlei_kc\train'

Pretrained Models

Are your pre-trained weights available to download for any of your experimental runs?

请问是不是models.resnet这些网络的实现有些问题?

我的pytorch==1.2, torchvision=0.4.0
我试着用torchvision.models中实现的ResNet去运行train.py训练,结果在测试集上的ACC只有 60% 左右。我用作者你实现的ResNet训练的话,ACC大概符合预期。
二者区别还有一个,Batch_size=128时,用torchvision.models中的网络训练,GPU_Memory大概只用了1.4G,而用作者你的实现,GPU_Memory大概要占用5G左右,感觉像是给Batch-Size中每个图片都分配了一个model。
请问这个实现网络的时候有区别,还是咱们的pytorch版本不同导致的问题?
谢谢。

Links at README.md

Thank you for the code.
Please note that all three links under the paragraph "Training Details" refer to the same paper.

Training stage

Do you train the model from scratch? or based on pre-trained weights on ImageNet

vgg.py

nn.Linear(512, 4096)
nn.Linear(512 * 7 * 7, 4096)

Is this a mistake in senet for fair comparison?

I notice that the stage4 of other instance of ResNet is 512 while the SENet is 516,
Is this a bug?
SENet:

self.stage4 = self._make_stage(block, block_num[3], 516, 2)

ResNet:
self.conv5_x = self._make_layer(block, 512, num_block[3], 2)

ReXsNet:
self.conv5 = self._make_layer(block, num_blocks[3], 512, 2)

Tensor sizes mismatches

Hi,

I am implementing ResNet18 models using the CIFAR-100 dataset.

I checked all the dimensions but got this error:

RuntimeError: The size of tensor a (100) must match the size of tensor b (32) at non-singleton dimension 3

Can you please tell me how to fix it?

Thanks

model better than torchvision model on cifar100

Hi,

I thank you for your repo.
I tried to compare the accuracy of your resnet18 with the torchvision one, I do not understand why without pretrained, your model gives 70+ accuracy but the torchvision only 55. Have you implemented something specific for the cifar dataset ? (cifar100)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.