This repository is about some implementations of CNN Architecture for cifar10.
I just use Keras and Tensorflow to implementate all of these CNN models.
(maybe torch/pytorch version if I have time)
- Python (3.5.2)
- Keras (2.1.2)
- tensorflow-gpu (1.4.1)
- The first CNN model: LeNet
- Network in Network
- Vgg19 Network
- Residual Network
- Wide Residual Network
- ResNeXt
- DenseNet
- SENet
There are also some documents and tutorials in doc & issues/3.
Get it if you need. ๐
network | dropout | preprocess | GPU | params | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | - | meanstd | GTX980TI | 62k | 30 min | 76.27 |
Network-in-Network | 0.5 | meanstd | GTX1060 | 0.96M | 1 h 30 min | 91.25 |
Network-in-Network_bn | 0.5 | meanstd | GTX980TI | 0.97M | 2 h 20 min | 91.75 |
Vgg19-Network | 0.5 | meanstd | GTX980TI | 39M | 4 hours | 93.53 |
Residual-Network110 | - | meanstd | GTX980TI | 1.7M | 8 h 58 min | 94.10 |
Wide-resnet 16x8 | - | meanstd | GTX1060 | 11.3M | 11 h 32 min | 95.14 |
DenseNet-100x12 | - | meanstd | GTX980TI | 0.85M | 30 h 40 min | 95.15 |
ResNeXt-4x64d | - | meanstd | GTX1080TI | 20M | 22 h 50 min | 95.51 |
SENet(ResNeXt-4x64d) | - | meanstd | GTX1080 | 20M | - | - |
Now, I fixed some bugs and used 1080TI to retrain all of the following models.
In particular๏ผ
Change the batch size according to your GPU's memory.
Modify the learning rate schedule may imporve the results of accuracy!
network | GPU | params | batch size | epoch | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | GTX1080TI | 62k | 128 | 200 | 30 min | 76.25 |
Network-in-Network | GTX1080TI | 0.97M | 128 | 200 | 1 h 40 min | 91.63 |
Vgg19-Network | GTX1080TI | 39M | 128 | 200 | 1 h 53 min | 93.53 |
Residual-Network20 | GTX1080TI | 0.27M | 128 | 200 | 44 min | 91.82 |
Residual-Network32 | GTX1080TI | 0.47M | 128 | 200 | 1 h 7 min | 92.68 |
Residual-Network50 | GTX1080TI | 1.7M | 128 | 200 | 1 h 42 min | 93.18 |
Residual-Network110 | GTX1080TI | 0.27M | 128 | 200 | 3 h 38 min | 93.93 |
Wide-resnet 16x8 | GTX1080TI | 11.3M | 128 | 200 | 4 h 55 min | 95.13 |
DenseNet-100x12 | GTX1080TI | 0.85M | 64 | 250 | 17 h 20 min | 94.91 |
DenseNet-100x24 | GTX1080TI | 3.3M | 64 | 250 | 22 h 27 min | 95.30 |
DenseNet-160x24 | 1080 x 2 | 7.5M | 64 | 250 | 50 h 20 min | 95.90 |
ResNeXt-4x64d | GTX1080TI | 20M | 120 | 250 | 21 h 3 min | 95.19 |
SENet(ResNeXt-4x64d) | GTX1080TI | 20M | 120 | 250 | 21 h 57 min | 95.60 |
Different learning rate schedule may get different training/testing accuracy!
The original paper start with a learning rate of 0.1, divide it by 10 at 81 epoch and 122 epoch(200 epochs total).
I just run some experiments. see ResNet_CIFAR for more details.
network | start learning rate | learning rate decay | epoch | batch size | accuracy(%) |
---|---|---|---|---|---|
Residual-Network20 | 0.1 | [81,122] | 200 | 128 | 91.82 |
Residual-Network32 | 0.1 | [81,122] | 200 | 128 | 92.68 |
Residual-Network50 | 0.1 | [81,122] | 200 | 128 | 93.18 |
Residual-Network110 | 0.1 | [81,122] | 200 | 128 | 93.93 |
- | 0.1 | - | - | - | - |
Residual-Network20 | 0.1 | [100,150] | 200 | 128 | 92.02 |
Residual-Network32 | 0.1 | [100,150] | 200 | 128 | 92.53 |
Residual-Network50 | 0.1 | [100,150] | 200 | 128 | 93.25 |
Residual-Network110 | 0.1 | [100,150] | 200 | 128 | 93.61 |
- | 0.1 | - | - | - | - |
Residual-Network20 | 0.1 | [150,225] | 300 | 128 | 91.95 |
Residual-Network32 | 0.1 | [150,225] | 300 | 128 | 93.07 |
Residual-Network50 | 0.1 | [150,225] | 300 | 128 | 93.12 |
Residual-Network110 | 0.1 | [150,225] | 300 | 128 | 94.13 |
Since I don't have enough machines to train the larger networks, I only trained the smallest network described in the paper. You can see the results in liuzhuang13/DenseNet and prlz77/ResNeXt.pytorch
Please feel free to contact me if you have any questions! ๐ธ