sacmehta / espnetv2 Goto Github PK

A light-weight, power efficient, and general purpose convolutional neural network

License: MIT License

Python 100.00%

convolutional-neural-networks cnn pytorch imagenet lightweight semantic-segmentation efficient deep-learning machine-learning

espnetv2's People

Contributors

Stargazers

Watchers

espnetv2's Issues

About Power consumption

First, I'm impressed by your good work.

I wonder how to measure power consumption on TX board.

I can't find more details about measuring power consumption in your paper.

Thanks

get 8% miou when train ade datasets with 7classes

Hello, firstly thanks for your share!
Recently, I am using your model 'cityscapes.bisenet.R18.speed' to training my own dataset which is processed to 7classes with pretrained model 'R 18', But i got 8% class iou .
Could you help me to find where the fault is?

When run segmentation codes, I got a error!

Cityscape Dataset Error

Hi, thank you for your work and for open sourcing your code!
When I run your code ,i have a problem:
Firstly, I have used this code (https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py) to convert the label images to trainIDs, but i still got error!

Can give me some advices to solve this error?

A strange problem encountered during training

I redesigned the network architecture by using the EESP module, and used the open source code method to train the network, but there was a strange phenomenon. The miou value of the training set is constantly increasing, but the miou value of the verification set is like a Random value, the value will remain unchanged at the end of the network training and is very small !

train/eval for another dataset

Hello, how can I be training/evaluating the network for other datasets except imagenet? For exemple, for people classification or something like this? I tried to run the same command on README changing only the classes number on the model variable and changing the dataset path (my dataset is divided into train and val folders and each one with folders for classes) and got some errors, can you help me? I want to use this network on raspberry pi.

MNasNet

Hi, thank you for your work and for open sourcing your code!

As you don't list mail addresses in the paper I abuse the Github issue tracker to ask a question about the paper:
Have you compared your work to MNasNet (https://arxiv.org/abs/1807.11626)?
MNasNet is stronger than Mobilenet v2 across different FLOP settings and achieves 75% TOP-1 imagenet accuracy with a comparable compute envelope as you used for Figure 3c. Is there a reason why you excluded MNasNet from your comparisons?

Thanks,
Christoph

Have you used ImageNet pretrained weight to train epsnetv2 for segmentation?

I want to check if the espnetv2 is trained with Imagenet pretrained weight. But it seems not clear in paper or the code.

Looking forward to your reply.

Performance Issues on NVIDIA GTX1080

I trained the same ESPNetV2 on my GTX 1080 CUDA GPU for 10 class semantic segmentation. I did some modifications to code so it works for my 10 classes. Input image size was 640x480 I got an mIoU of 62% on validation which is really good. However I was expecting a greater performance but I got 52 fps as inference speed which is average of all samples[5:]. I wanted to know why there is a huge difference in performance claimed by paper(140 fps) and implementation(50fps) ?... this is how ran the code
python main.py --batch_size 10 --s 1.0 --inWidth 640 --inHeight 480 --max_epochs 350 --batch_size 32 --classes 10 --csvfile ~/data/Cityscape_v2/class_dict_grouped.csv --data_dir pwd
the last parameter I have added to train for my dataset.

Can this model be used for regression tasks?

I want to use this model to tackle a regression task, can you make sure this model works in regression tasks?

Network parameters of segmentation model in the paper

Thanks for your impressive work and code @sacmehta .
After I read your ESPNetv2 (CVPR'19) paper and previous versions on arXiv,
I have a question about the number of network parameters of segmentation model.

In Figure 7(c) of v1 and v2 papers, there are 2 models which have 725K, 99K parameters, respectively.
However, in Figure 7(a) of v3 paper, there is a model that is more accurate than previous ones without the number of parameters.

Can you tell me the number of parameters of your model in Figure 7(a) of v3 paper?
Or the value of 's' which is scaling parameter in your model would be enough.
Thank you.

Question about predict on espnetv2

How can I be using ESPNETv2 to make inferences on random images? In your repository and in the Edgenets repository I just found ways to evaluate in a specific dataset, but I would like to pass external images as a parameter and see the net result, how can I be doing this?

a strange question

Downsampling

What was the reasoning for downsampling block to make depthwise convolution strided instead of making pointwise convolution strided?

Can anyone upload a pretrained model?

HI, Deal all,
Can anyone kindly upload a pretrained model. Training is really time consumping, I just want to evaluate the performance. Thank you.

hi，when i train segmentation with two classes

hi, when i run the segmentation with two classes ,i get the error as below:

Classes Number maybe not reasonable

Hello, Writers, Firstly thanks for your share.
I find that Cityscapes Datasets includes 34 classes, thus the segmentation code write for only 20 classes, do you transform the label values to 20 classes before training?

About cyclic learning rate

it only appear in the imagenet task, Why not segmentation task ?

Question about the comparison between ESPNet and ESPNetv2 on your paper.

Hi,

I have a question about the graph (It's on 4.2 Semantic segmentation -- Results -- (a) ESPNet vs. ESPNetv2 (validation set)) on your ESPNetv2 paper.
There are 6 points in the graph, 3 is for ESPNetv2 and the others' for ESPNet.
I am wondering those points' parameters are p=2, q=3,5,8 for ESPNet and s=0.5,1.0,1.5 for ESPNetv2 or any other parameters.

Thanks.

Is there a problem here ?

ESPNetv2/segmentation/main.py

Line 209 in 6c70184

is_best = mIOU_val > best_val

The value of best_val always be 0 , which means that model_best.pth stores not the best, but the last eopch!

Using sum of 2 losses

Hi, thank you for your works and share.

I found there are two outputs from EESPNet_Seg
output1 from level 4 (used in inference)
output2 from level 2 (only used in training stage)

I'm wondering why did you use sum of 2 losses
loss1 = criterion(output1, target)
loss2 = criterion(output2, target)
loss = loss1 + loss2

What if trying "loss = criterion(output1+output2, target)" and using "output1+output2" as a final segmentation output, which is similar to "skip layer" used in FCN-8s.
What if using one more (another) output from level 3.

If you have already tried those combination, can you inform the details you tried and explain the reason why you didn't use that?
If not, what do you think about that concept? what can we expect?

What is the actual latency on ARM or CPU?

Thanks for the great work! And I also find that ESPNetv2 has comparable performance with MobileNet or ShuffleNet with even less FLOPS, but is the actual inference speed (images / sec) on ARM or CPU faster than the other light weighted architectures?

How to train own dataset?

Hello, I'm a student studying machine running.

I found ESPNet v2 after looking for a network of real time semantic segmentation processing on TX2.

I don't have the cityscapes dataset, so I want to do training with own dataset.

I'm a beginner about machine running.

Could you provide a tutorial that trains to own dataset?

Thank you in advance.

Espnet .pth with another scale factor

I searched for the .pth files in your repository with s == 1.0 trained with COCO / VOC and couldn't find it. Could you make these files available?

You have following values as class labels: [ 0 1 2 4 5 6 7 8 10 11 13 255] Some problem with labels. Please check image file: ./city/gtFine/train/cologne/cologne_000000_000019_gtFine_labelTrainIds.png Exiting!!

Hello, could you please teach me to solve this problem?
Thank you!

Turning ESPNet into c++ cudastream problem

I want to turn your python code into C++/Cuda Code.
So I want to convert your model into pt / ONNX model.
I found this conversion problem:
/home/wjl/project/ESPNetv2/segmentation/cnn/Model.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if w2 == w1:
/home/wjl/project/ESPNetv2/segmentation/cnn/Model.py:91: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if expanded.size() == input.size():

It seems "if ==" not support in cudastream, how can I avoid it?

The tool to calculate FLOPs

How do you calculate those FlOPs? what tools? Could you tell me? Thank you!!
: )

I cannot get your accuracy after following your steps to train on the cisyscapes datase.

here is my result:
Epoch: 124 Overall Acc (Tr): 0.8973 Overall Acc (Val): 0.8530 mIOU (Tr): 0.4831 mIOU (Val): 0.4305
Per Class Training Acc: [0.9563952 0.87780535 0.9068668 0.43044552 0.5774933 0.5071546
0.43101206 0.6368849 0.921697 0.7282442 0.9652385 0.788574
0.32997596 0.94641626 0.22630975 0.26235422 0.14418961 0.17280169
0.6403003 0.64186233]
Per Class Validation Acc: [0.9250288 0.7664717 0.9041689 0.2594584 0.4326968 0.47890368
0.36922717 0.6052691 0.89565104 0.5700432 0.92304826 0.7061085
0.36114156 0.89745665 0.21361473 0.30452752 0.04327752 0.09240897
0.6547894 0.64558077]
Per Class Training mIOU:
[0.93848205 0.6837373 0.826016 0.31308392 0.36571708 0.32630438
0.29591504 0.4236251 0.853054 0.5105453 0.8888189 0.5311395
0.23751496 0.84709704 0.18447216 0.21306016 0.12266205 0.13488355
0.41068017 0.55468637]
Per Class Validation mIOU:
[0.8795196 0.59463114 0.78535455 0.20169742 0.21248965 0.32275337
0.2726844 0.45283318 0.82542694 0.3319856 0.82660824 0.43451232
0.2278607 0.7965249 0.138431 0.2273348 0.02951417 0.07179707
0.40751818 0.57027286]
and the miou can't be improved.
I want to know whether this situation happened while your training. Thanks a lot.

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 136 and 135 in dimension 2 at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generic/THCTensorMath.cu:87`

Thank you for developing ESPNet!
I have three questions

・ About labels to be ignored
My own dataset has 11 classes except the background. And we assigned 255 labels to the background.
So in Dataset.py and loaddata.py
label_img [label_img == 255] = 19
label_img [label_img == 255] = 11
Written and executed
CUDA_VISIBLE_DEVICES = 0, 1, 2, 3 python main.py - data_dir ./izunuma - classes 11 - batch_size 10 - s 1.0

Labels can take value between 0 and number of classes 10.
You have following values as class labels:
[0 1 11]
Some problem with labels. Please check image file

I encountered this error.
To solve this, include the background in the number of classes, assign 12 to the --classes argument The solution is correct.

・About errors during learning
I set the number of classes in the above method and started training, the following error occurred.
Image size width 640 height 360
~/github/ESPNetv2/segmentation$ CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --data_dir ./izunuma --classes 12 --batch_size 10 --s 1.0
Total network parameters: 338430
Data statistics
[131.47914 144.9144 134.75436] [76.80522 68.83018 71.792274]
[ 9.698015 9.98296 7.912603 8.275558 3.726631 10.492059
10.192185 4.4507203 10.4207115 10.338895 10.329051 1.9822153]
Learning rate: 0.0005
Traceback (most recent call last):
File "main.py", line 263, in
trainValidateSegmentation(parser.parse_args())
File "main.py", line 200, in trainValidateSegmentation
train(args, trainLoader_scale1, model, criteria, optimizer, epoch)
File "/home/nouki/github/ESPNetv2/segmentation/train_utils.py", line 89, in train
output1, output2 = model(input)
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/nouki/.pyenv/versions/anaconda3-5.3.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/nouki/github/ESPNetv2/segmentation/cnn/SegmentationModel.py", line 62, in forward
merge_l2 = self.project_l2(torch.cat([out_l2, out_up_l3], 1))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 136 and 135 in dimension 2 at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generic/THCTensorMath.cu:87

I have not had time to learn pytorch, so I did not know what this error means by examining.

・Code mistake?
Perhaps the code of this part is wrong
loadData.py line 41
label_img [label_img = 255] = 19
↓
label_img [label_img == 255] = 19

DataSet.py
label [label = 255] = 19
↓
label [label == 255] = 19

Thank you!

What does 's' mean?

What does the network scale parameter 's' set during training mean?
CUDA_VISIBLE_DEVICES = 0, 1, 2, 3 python main.py - batch_size 10 - s 1.0

On lines 34-37 of SegmentationModel.py
if s <= 0.5:
p = 0.1
else:
p = 0.2

and line 46
self.project_l1 = nn.Sequential (nn.Dropout2d (p = p), C (self.net.level1.act.num_parameters + classes, classes, 1, 1))

I interpreted 's' as a parameter of dropout. However, looking at lines 34-37, it seems that there is no difference as s = 1.0 and s = 1.5 both p = 0.2.

What does 's' mean?

numpy is not imported in Dataset.py

I found very trivial error in your code at Dataset.py
if 255 in np.unique(label): label[label==255] = 19

np is not defined yet.

compared with ShuffleNet V2

@sacmehta
hi, after read your paper and run the code, I've tried compared it with shufflenet v2 1×(implemented in tensorflow) for image classification, their GPU(1080Ti) Speed as follow:
shufflenet v2 1×: 3ms/per image
ESPNet V2: 13ms/per image
I want to make sure my test process is right? (or the actual forward infer is the above speed), thanks

Is it possible to get probability(likelihood) for foreground class label?

Hello Sachin, I am very beginner of Deep Learning , I apologize if my question is stupid.

I trained with my own annotation images (background is 0, foreground is 1 ) and set "--classes=2".
Then I want to know the probability(likelihood) of label 1 for each pixel of test images.
In other word, I want to get heat-map of probability of foreground label.
Is it possible?
And, I want to know whether trying 2 class segmentation on this network is correct approach or not.

Training issues (from pretrained weights)

First, great work!
Unfortunately, I ran into three weird behaviors when training your network.
I'm trying to fine-tune your network for the kitti dataset, which provides a mere 200 labeled images. Kitti uses the same labels as in CityScapes. Images are 1242 pixels wide and 375 pixels height.

First, I update the train.txt and val.txt, then I relabel images using cityscapes' mapping from labelId to trainId, I then encounter a problem concerning the 255 label, which I simply discard in the main.py. Besides this small inconvenient, everything so far is good (would be nice to add --ignore_id argument in the main script).
When training, regardless of using pretrained weights I would expect to have to use the arguments you provide, basically 'inWidth' and 'inHeight'. Nevertheless, when using such values the training breaks:
Running:

CUDA_VISIBLE_DEVICES=0,1 python3 main.py --batch_size 8 --s 1.0 --data_dir ./kitti --cached_data_file kitti.p --inWidth 1242 --inHeight 375

Outputs:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 72 and 71 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

Similar Issues: #6 #13

Interestingly enough, leaving the default values makes it run...

Even more weirdly, when using the pretrained weights, I encounter the following issue:
Running:

CUDA_VISIBLE_DEVICES=0,1 python3 main.py --batch_size 10 --s 1.0 --data_dir ./kitti --cached_data_file kitti.p --pretrained ./pretrained_weights/espnetv2_segmentation_s_1.0.pth

Outputs:

Traceback (most recent call last):
  File "main.py", line 269, in <module>
    trainValidateSegmentation(parser.parse_args())
  File "main.py", line 30, in trainValidateSegmentation
    model = net.EESPNet_Seg(args.classes, s=args.s, pretrained=args.pretrained, gpus=num_gpus)
  File "segmentation/cnn/SegmentationModel.py", line 25, in __init__
    classificationNet.load_state_dict(torch.load(pretrained))
  File "venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
	Missing key(s) in state_dict: "module.level1.conv.weight", "module.level1.bn.weight", "module.level1.bn.bias", "module.level1.bn.running_mean",  ............A long list....................
	Unexpected key(s) in state_dict: "module.net.level1.conv.weight", "module.net.level1.bn.weight", "module.net.level1.bn.bias", "module.net.level1.bn.running_mean", "module.net.level1.bn.running_var", "module.net.level1.bn.num_batche, ............Another long list....................

As interesting as before, if I run the same command with the pretrained weights of object classification, it runs... (this is quite confusing)
Running:

CUDA_VISIBLE_DEVICES=0,1 python3 main.py --batch_size 8 --s 1.0 --data_dir ./kitti --cached_data_file kitti.p --pretrained ../imagenet/pretrained_weights/espnetv2_s_1.0.pth

Outputs:

Model initialized with pretrained weights
Total network parameters: 340782
Data statistics
[ 98.74988 102.4303   97.34298] [80.21684 78.51919 75.99867]
[ 3.5197217  7.555117   5.793313  10.019754   9.771379   9.226719
 10.181938   9.925934   3.145523   5.8163567  5.3573713 10.378751
  8.682274   6.40663   10.266428  10.30295   10.258763  10.4816475
 10.429212   7.707068 ]
Learning rate: 0.0005
Train: epoch 0
[0/23] loss: 6.070 time:6.91
[1/23] loss: 5.943 time:0.52
[2/23] loss: 5.866 time:0.48
[3/23] loss: 5.786 time:0.49
[4/23] loss: 5.752 time:0.48
[5/23] loss: 5.597 time:0.48
...

How can I train your network using the pretrained weights for segmentation using the kitti dataset?
The problem is that if I run the network on the kitti dataset with the pretrained weights only, it does not look very nice.