GithubHelp home page GithubHelp logo

johnsoningzhuang / deep-residual-network-for-mxnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from freesouls/deep-residual-network-for-mxnet

0.0 1.0 0.0 7 KB

a Deep Residual Network Example for MXNet on cifar10 dataset

Python 100.00%

deep-residual-network-for-mxnet's Introduction

Deep Residual Network For MXNet

a Deep Residual Network Example for MXNet on cifar10 dataset

Paper: Deep Residual Learning for Image Recognition on arxiv.org

Chinese Version

If you are a Chinese, please goto my blog

我的中文blog

It has been merged to MXNet

Notes:

  1. The example has serveral differences to the paper, This example just proves the rule: The Deeper, The Better
  2. You are welcomed to discuss you point of view with me, for example, You think the batch_normalization should apply in different places, for the author does not say it very clearly. I hope We could achieve a better performance.

Commands & Setups:

  • in example/image-classification/train_model.py
    • set momentum = 0.9, wd = 0.0001, initializer = mx.init.Xavier(rnd_type="gaussian", factor_type="in", magnitude=2.0)
  • in the get_symbol function in example/image-classification/symbol_resnet-28-small.py
    • set n=3(3 for 20 layers, n=9 for 56 layers)
  • first train the network with lr=0.1 for 80 epochs
python example/image-classification/train_cifar10.py --network resnet-28-small --num-examples 50000 --lr 0.1 --num-epochs 80 --model-prefix cifar10/resnet --batch-size 128
  • second train the network with lr=0.01 from epoch 81 to epoch 120, with lr=0.001 from epoch 121 to epoch 160
python example/image-classification/train_cifar10.py --network resnet-28-small --num-examples 50000 --model-prefix cifar10/resnet --load-epoch 80 --lr 0.01 --lr-factor 0.1 --lr-factor-epoch 40 --num-epochs 200 --batch-size 128

in the paper, he train cifar10 for 160 epoch, I set num-epoch to 200 because I want to see whether it is usefull when set lr=0.0001

since it needs 160 epochs, please be patient. And I train with batch size of 128, and train all the models on 1 GPU

Test Accuracy:

  • for 20 layers resnet, accuracy=0.905+, 0.9125 in the paper
  • for 32 layers resnet, accuracy=0.908+, 0.9239 in the paper
  • for 56 layers resnet, accuracy=0.915+, 0.9303 in the paper

Though the numbers are a little bit lower than the paper, but it does obey the rule:

the deeper, the better

Differences to the paper on cifar10 network setup

  1. in the paper, the author use identity shortcut when dealing with increasing dimensions, while I use 1x1 convolutions to deal with it
  2. in the paper, 4 pixels are padded on each side and a 32x32 crop is randomly sampled from the padded image, while I use the dataset provided by mxnet, so the input is 28x28, as a results for 3 different kinds of 2n layers output map sizes are 28x28, 14x14, 7x7, instead of 32x32, 16x16, 8x8 in the paper.

the above two reason might answer why the accuracy is a bit lower than the paper, I suppose. Off course, there might be other reasons (for example the true network architecture may be different from my script, since my script is just my understanding of the paper), if you find out, please tell me,

Contact information:

##Thanks

deep-residual-network-for-mxnet's People

Contributors

freesouls avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.