ikhlestov / vision_networks Goto Github PK

View Code? Open in Web Editor NEW

265.0 13.0 122.0 46 KB

Repo about neural networks for images handling

License: MIT License

Python 100.00%

densenet machine-learning tensorflow computer-vision

vision_networks's Introduction

DenseNet with TensorFlow

Two types of Densely Connected Convolutional Networks (DenseNets) are available:

DenseNet - without bottleneck layers
DenseNet-BC - with bottleneck layers

Each model can be tested on such datasets:

Cifar10
Cifar10+ (with data augmentation)
Cifar100
Cifar100+ (with data augmentation)
SVHN

A number of layers, blocks, growth rate, image normalization and other training params may be changed trough shell or inside the source code.

Example run:

python run_dense_net.py --train --test --dataset=C10

List all available options:

python run_dense_net.py --help

There are also many other implementations - they may be useful also.

Citation:

@article{Huang2016Densely,
       author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.},
       title = {Densely Connected Convolutional Networks},
       journal = {arXiv preprint arXiv:1608.06993},
       year = {2016}
}

Test run

Test results on various datasets. Image normalization per channels was used. Results reported in paper provided in parenthesis. For Cifar+ datasets image normalization was performed before augmentation. This may cause a little bit lower results than reported in paper.

Model type	Depth	C10	C10+	C100	C100+
DenseNet(k = 12)	40	6.67(7.00)	5.44(5.24)	27.44(27.55)	25.62(24.42)
DenseNet-BC(k = 12)	100	5.54(5.92)	4.87(4.51)	24.88(24.15)	22.85(22.27)

Approximate training time for models on GeForce GTX TITAN X GM200 (12 GB memory):

DenseNet(k = 12, d = 40) - 17 hrs
DenseNet-BC(k = 12, d = 100) - 1 day 18 hrs

Difference compared to the original implementation ---------------------------------------------------------The existing model should use identical hyperparameters to the original code. If you note some errors - please open an issue.

Also it may be useful to check my blog post Notes on the implementation DenseNet in tensorflow.

Dependencies

Model was tested with Python 3.4.3+ and Python 3.5.2 with and without CUDA.
Model should work as expected with TensorFlow >= 0.10. Tensorflow 1.0 support was recently included.

Repo supported with requirements files - so the easiest way to install all just run:

in case of CPU usage pip install -r requirements/cpu.txt.
in case of GPU usage pip install -r requirements/gpu.txt.

vision_networks's People

Contributors

Stargazers

Watchers

Forkers

johndpope wanjinchang paojianghu tybxiaobao snakeroot91 benjamesbabala hundred06 lyk125 gudongfeng huangr76 thatfreesky sh1ng ngchc onisimchukv balodhi aliscifp zhouzhouhe shiyongde dawitmureja obendidi afelio2 sharonzhu robopassion xhivaw himaivan kongsea jxlin canbuoy weijingshi tangxinkevin 123chengbo chunfeima zhangwenhao123 cpacker mixcoder orientier7 amitayus loppol38 ryfan-rs davidmrdavid alexliyang zizhengtai lan1991xu gxfun fatterzhang zmxheart dawnhh fantested hiredd root-master shurenlee larenzhang terrych1995 tompfeil vincentxu112 reyadrahman denethor1997 sankexin teddyjkwang samxiaosheng tianjiangood reloadbrain jimwi www0wwwjs1 pqy000 xn8812 juventi jiangshaoyou zjshf ypengit morindaz noahfl chronustime gitfenging yhxiu summer-bunny eong2012 yang-fei hengfa cjr0106 nammbash dilinwang820 apexpredator1 luxuriance19 weilsonchina zbxzc35 sammul40619 alicelcz chengyiwen christiemyburgh albertdu saeedseyyedi tonyissacjames oucandrewlee songhwanjun antoniogarciadiaz yangl326-dylan sonnyhu weixx11 tantao258

vision_networks's Issues

Bias is used?Other Model can't run!

Bias is used?
I understood your code ,your only used in 'trainsition_layer_to_classes'.
I find PyTorch soucecode that is like used 'conv2d +bias'.
Other Model can't run!
Your implementation can run two cases.

DenseNet(k = 12) d=40
DenseNet-BC(k = 12) d=100

Others can't run .Because OOM
So,Pytorch version has "Memory Efficient Implementation of DenseNets" implementation detail.
https://github.com/liuzhuang13/DenseNet/tree/master/models
https://github.com/liuzhuang13/DenseNet/tree/master/models
Main ideal is that used "share variable".
I understood your code ,your try to use 'out' variable.(reduce?)
I think that will build another varable in tensorflow graph.

I want to implement reduce memory.Do you have some ideals?
I'm main idea that use share variable.(https://www.tensorflow.org/programmers_guide/variable_scope)
But I think that has problem in (tf.concat)?

Is there some way to get the softmax (probability values) from the saved models using the repos code.

Hi, is there some way to get the softmax probability values from the saved models instead of the final class predictions.

l2 regularization for bias too.. is it necessary?

Thank you for your effort, it is really helping me in my project. Illarion, I had a small doubt in the implementation part.

While you have applied l2_regularization, you have applied to all the parameters ie weights and bias. Is that advisory? given that l2 is mainly/mostly applied only to weights.

l2_loss = tf.add_n( [tf.nn.l2_loss(var) for var in tf.trainable_variables()])

Shouldn't we just add the tf.nn.l2_loss for weights only ?

SVHN normalization issue

normalization process of SVHN images has issues. I have changed it which yields much better results.

elseif normalization_type =='mean_0':
train_n = np.full(((images.shape(0),images.shape(1),images.shape(3)),255.0,dtype-np.float32)
pixel_depth=255.0
images = ((images-train_n)/2)/pixel_depth

batch_norm in dense_net.py

Hi, we found that in dense_net.py, its batch_norm used with tf.contrib.layers.batch_norm api rather than tf.nn.fused_batch_norm api, is there any reason for specific using contrib.layers.batch_norm api? because in current intel mkldnn backend, contrib.layers has much overhead compared with nn.fused_batch_norm.

Problem regarding the number of features generated at the initial convolution layer

In Densenet paper, it mentions that for non-BC architectures, the number of the output features for the initial conv layer should be 16, and for BC architectures, it's twice as the growth rate. However, in your implementation, the number is always twice as the growth rate.

Why did you remove the max pooling layer after the initial convolution?

I remember there is a 3*3 max pooling layer after the initial convolution. Why remove it?

can you provide your trained models?

hi, ikhlestov,
you really do a great job. thanks for your share.
can you provide your trained models?

Don't you have to update batch norm statistics?

Hi,

thank you for the implementation!
I can't find update stat operation
from the documentation
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = optimizer.minimize(loss)

https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm

How about performance for flower dataset?

I am interested in flower dataset which is a small data set of a few thousand flower images spread across 5 labels: daisy, dandelion, roses, sunflowers, tulips. Could you write the python file for the flower dataset, besides cifar and svhn? I want to use your code to train it. Thanks

The script to download and separate into the train and validation folder is at https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_flowers.sh

Is there any memory-efficient tensorflow Implementation？

Hi, I have found the same problem about the GPU memory，and is there any memory-efficient tensorflow Implementation？
Thanks very much!

ResourceExhaustedError: OOM when allocating tensor

Hi,

When trying "--growth_rate=12 --depth=100 --dataset=C100", it returned "ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,372,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc"

From the GPU usage, I found that it only used GPU[0] and hit OOM:

How to resolve it?
Regards

'by_chanels' normalization issue between different splits

Hi Illarion,

Thanks for the beautiful code.

I have a question regarding the 'by_chanels' normalization on the CIFAR-10 test set. It seems like, when preprocessing the CIFAR-10 test set, you are computing the means of test set instead of training set since the CifarDataSet objects compute the individual mean of different splits(train/val/test). However, would it be correct that the preprocessing statistics has to come only from the training set?
(Reference: http://cs231n.github.io/neural-networks-2/, 'common pitfalls' paragraph)
Correct me if I misread your code.

Again, thanks for your work and It is a great implementation of DenseNet.

Shouldn't bottlenecks go in the transition layers ?

Hello,
Thanks for this implementation. I'm trying to follow along and don't understand a fine point.

In the paper they put the bottlenecks as part of the transition layers, whereas you placed the bottlenecks in each internal layer of each block

I suspect that having them in the transition layers is the correct approach, since the point of the bottleneck is to reduce the size of the accrued feature maps due to concatenation. Within each internal layer we aren't accruing much, and I suspect that having the bottlenecks there actually increases the number of parameters as the size of each feature map is less than 4*growth_rate.

Choose GPU device

Hi,

Thanks a lot for this "vision_networks"!

I have two GPUs in my machine, exactly same type of GPUs. The first one is also used to connect to monitors, I found that the first GPUs has less available RAM than the second one.

Found device 0 with properties:
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 5.14GiB

Found device 1 with properties:
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:02:00.0
totalMemory: 7.92GiB freeMemory: 7.70GiB

Two questions:
First, is there a way to use two GPUs in parallel to speed up the training process? Here, it always uses
the first GPU and the second one is idle.

Second, is there a way to choose which GPU to be used if it only supports one GPU?

Regards

Problem about the running time.

Hi!
Thanks for your kind sharing! There is a problem when I running your code for Cifar10 classification. That is, when I change the kernel size of the convolutional layers in each block to 1x1 (from 3x3 to 1x1), the running time is about 4.11s for each epoch (from 3.05s to 4.11s) on Titan X. However, 3x3 convolution always consumes much computional resources than 1x1 convolution. So I am confused. Can you help analyze whether there is a problem in your code or in the tensorflow optimization?
Thanks again!

Error while running SVHN Dataset

The code does not execute beyond the data provider for SVHN. It works well for both C10 and C100.
Log File:
Prepare training data...
Traceback (most recent call last):
File "run_nn_pruning.py", line 154, in
data_provider = get_data_provider_by_name(args.dataset, train_params)
File "/home/gkrish19/TCAD/DenseNet/data_providers/utils.py", line 17, in get_data_provider_by_name
return SVHNDataProvider(**train_params)
File "/home/gkrish19/TCAD/DenseNet/data_providers/svhn.py", line 85, in init
images, labels = self.get_images_and_labels(part, one_hot)
File "/home/gkrish19/TCAD/DenseNet/data_providers/svhn.py", line 117, in get_images_and_labels
data = scipy.io.loadmat(filename)
File "/home/gkrish19/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio.py", line 142, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "/home/gkrish19/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio5.py", line 292, in get_variables
res = self.read_var_array(hdr, process)
File "/home/gkrish19/anaconda3/lib/python3.6/site-packages/scipy/io/matlab/mio5.py", line 252, in read_var_array
return self._matrix_reader.array_from_header(header, process)
File "mio5_utils.pyx", line 675, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 705, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 778, in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
File "mio5_utils.pyx", line 450, in scipy.io.matlab.mio5_utils.VarReader5.read_numeric
File "mio5_utils.pyx", line 355, in scipy.io.matlab.mio5_utils.VarReader5.read_element
File "streams.pyx", line 195, in scipy.io.matlab.streams.ZlibInputStream.read_string
File "streams.pyx", line 188, in scipy.io.matlab.streams.ZlibInputStream.read_into
OSError: could not read bytes

reported error results

hello
i have a question regarding the reported test (error) results. is it the mean cross entropy ? or something else. please explain these numbers "6.67(7.00)".

Test Results on C10+?

Hello, I wonder if you have test results on C10+ dataset?

Thanks!

The order of image normalization and augmentation seems to be the same as the original implementation?

You mentioned that "For Cifar+ datasets image normalization was performed before augmentation. This may cause a little bit lower results than reported in paper."
However, I check the image preprocessing parts of these two implementations, and I find that both of them apply normalization before augmentation, so there should be other reason for the difference in performance.

How to normalize image in pre-processing step

As I read your notice, I found an interesting thing as "normalized by mean/std of all images in the dataset(train or test), not by its own only".

next I’ve implemented per channel normalization… And networks began works even worse. It was not clear for me why. .... After precise debugging, it becomes apparent that images should be normalized by mean/std of all images in the dataset(train or test), not by its own only.

Given a training dataset, it includes 100 images with size 256x256 (gray image). From your notice, you mean that we will find the mean among all 100 images and std of 100 images and normalize a given based on these value. Is it right? For implementation, it likes

mean_all=mean[image1, image2,...image100]
std_all=std[image1, image2,...image100]
image1_normalize=(image1-mean_all)/std_all
image2_normalize=(image2-mean_all)/std_all
...

In my opinion, using global mean (mean of all images in the dataset) may sensitive to some images which have high illumination. Instead of this, I think that normalization based on mean/std of the image itself will be better. It likes

image1_normalize=(image1-image1.mean())/image1.std()
image2_normalize=(image2-image2.mean())/image2.std()
...

Do you try with above case? One thing I want to ask if you use global means, then do you need to recompute global mean for the testing set?

requirements issue

If I want to use most recent tensorflow-gpu (Note it is not tensorflow)

then I would pip install tensorflow-gpu

And tensorflow-gpu requires

enum34>=1.1.6
six>=1.10.0
tensorflow-tensorboard<0.5.0,>=0.4.0rc1
numpy>=1.12.1
wheel>=0.26
protobuf>=3.3.0
werkzeug>=0.11.10
markdown>=2.6.8
html5lib==0.9999999
bleach==1.5.0
...

So in the requirement.txt file

I think it would be better to write

protobuf>=3.3.0
numpy>=1.12.1

instead of

numpy==1.12.0
protobuf==3.1.0.post1

Also, I think it is better to use >= instead of == in all entries in requirements.txt file

Otherwise when somebody installed pip install -r requiremets.txt Maybe he will install a older version of package

Train on images on my desktop

How Can I train this model on the images on my desktop?

Images in the form as shown in the screenshot.

SVHN performance

Hi,

Thanks for providing the code.
I ran your code with SVHN, but it only worked when I set normalization to "by_chanels".
(This is more an FYI than an issue)

Best
Armin

Thanks!

Roy

How to use 2 gpus?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          On   | 00000000:04:00.0 Off |                  Off |
| N/A   68C    P0   132W / 235W |   4267MiB / 12205MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          On   | 00000000:84:00.0 Off |                    0 |
| N/A   42C    P0    61W / 235W |     80MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    143163      C   python                                      4254MiB |
|    1    143163      C   python                                        69MiB |
+-----------------------------------------------------------------------------+

On my server, there are two gpus, when I run code, it seems that only one k40 is running