Comments (64)
I see, so probably it'll take 4 days to converge.
from pytorch-cpn.
@Tiamo666
I trained the CPN101-384x288
model from scratch on single 1080ti
GPU with epoch=32
.
One key difference is the batch_size
is set to 18
.
And it takes almost 9 days for training from scratch.
One more thing to be noted is I use the GT bbox for training the above model.
from pytorch-cpn.
Hi @mkocabas ,
Thanks for your interest in my implementation.
There may be at least two configurations to be tested, ResNet-50+384x288
and ResNet-101+384x288
. Which one do you prefer to test? Or do you want to test both of them?
I've modified the codes a little, so please clone/pull the latest version before you run it. Please follow README
to configure the environment.
You can train a ResNet-50+384x288
model directly in 384.288.model dir. by running train.py
You may need to modify batch size in config.py
, and use -g
to specify the number of GPU you use. For example, you may set batch_size = 12
and run python3 train.py -g 2
when you use 2 x 1080 gpu to train the model.
To train a ResNet-101+384x288
model, you need to set model='CPN101'
in config.py
, and then follow the same way to train the model.
If you have any questions, feel free to contact me. You can also mail me at [email protected] or [email protected].
from pytorch-cpn.
Cool, so I can start with ResNet-50+384x288
. After that I can try ResNet-101
.
I'll use 2 x 1080ti
with the default hyperparameters as in config. Am I correct?
from pytorch-cpn.
@GengDavid we have a little problem. 1080ti
s have 11GB memory. batch_size=6
barely fits the memory. This means that we can train with batch_size=12
using 2 gpus. What do you think?
from pytorch-cpn.
If you are using 1080ti
s, I think you can set batch_size
more than 12 with 2 gpus while running ResNet-50+384x288
model.
from pytorch-cpn.
@mkocabas ResNet-50+384x288
model with batch_size=12
takes about 8G memory in my experiment.
from pytorch-cpn.
I'm consistently getting OOM error, but let me check. I'll restart the computer, maybe there are some blocking processes. I'll inform you about the progress.
from pytorch-cpn.
@GengDavid, restarting solved the problem. Thanks for pointing out! I'll update this issue as training continues.
How many epochs did you train the 256x192
model?
from pytorch-cpn.
@mkocabas About 25 epoch. I don't remember the exact figure.
from pytorch-cpn.
Fine, thanks.
from pytorch-cpn.
Epoch 6 (tested with GT bboxes)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.688
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.894
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.750
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.654
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.742
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.719
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.904
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.776
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.681
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.777
Epoch 13 (tested with GT bboxes)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.726
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.914
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.785
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.690
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.781
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.754
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.924
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.810
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.716
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.812
from pytorch-cpn.
@GengDavid do you have the weights of 5th epoch of ResNet50-256x192
model?
from pytorch-cpn.
Yes, I do have saved the 5th epoch pre-trained model.
But I'm sorry to tell you that there's something different from the original paper in my code just as @Tiamo666 mentioned in issue #4.
The results seem very close, but I'm still going to modify the network and then re-test it.
from pytorch-cpn.
Yeah I saw the discussion. Please let me know about the results after modification. If you don't have enough GPUs, I can test the corrected model.
from pytorch-cpn.
I'll let you know the results but it may take a little long time since I only have 1*1080
free to run the code. May be you can test test the ResNet-50+384x288
model first.
Thanks!
from pytorch-cpn.
I've started to train fixed ResNet-50+384x288
on a Titan V
w batch-size=24
from pytorch-cpn.
Hi, @mkocabas
I've updated the ResNet-50+256*192
results. Have got some results?
Thx.
from pytorch-cpn.
Hi, David, I've trained with the ResNet-50+384*288 with ground truth bboxes.
The test result of 32 epoch is as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.737
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.915
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.806
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.706
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.792
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.767
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.929
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.826
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.729
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.824
Due to the limit of network, I can not download the person detections results on COCO successfully, So I just use the ground truth.
from pytorch-cpn.
@Tiamo666 Great job!
Can you provide the pre-trained model so that I can test it with detection results?
I think you can open a PR with the a link on it to download pre-trained model.
from pytorch-cpn.
@Tiamo666 Or if you do not want to open a RP, could you just provide a link to download the model? Google Drive, Onedrive, Dropbox and Baidu Yun are all fine.
from pytorch-cpn.
OK,I guess Baidu yun is a good choice. I will try to share the pretrained model on it and provide you the link as soon as I uploaded model
from pytorch-cpn.
hi,David, I've already uploaded the model on BaiduYun.
Here is the link:
https://pan.baidu.com/s/1fdy5_0HQm63QtlOzxKbpuw
from pytorch-cpn.
Great! I'll test it and update the result later.
from pytorch-cpn.
@Tiamo666 I've updated the results.
from pytorch-cpn.
That's cool!
I'll have time to train with Resnet101+384*288, I'll share the model after finishing training
from pytorch-cpn.
@Tiamo666 That's great! If you have any problem, feel free to contact me.
from pytorch-cpn.
Hi, David. I've uploaded the model of cpn384*288 with Resnet101 on Baidu Yun.
Here is the link:
https://pan.baidu.com/s/1toikUHSqHhHP3DkIOkNctA
from pytorch-cpn.
@Tiamo666 Great! Thanks a lot. I'll update the results soon.
from pytorch-cpn.
Hello, David, I've just found that I trained with the old code which has "Color Normalized bug" last week. I feel sorry for that, I could retrain the model this week.
from pytorch-cpn.
@Tiamo666 Retraining it is a better choice but may cost more time. I think we can just fine-tune the trained model. This may influence the result a little but can save time. However, I currently do not have free GPUs to do this work.
What do you think about that?
from pytorch-cpn.
OK, Thank you for your advice. I think fine-tune the model is a good idea.
Another thing I wanted to mention is that in issue#7, it doesn't matter whether there is bias in nn.conv2d cause the batchnorm will minus the mean value, so plus a constant will not affect the result.
from pytorch-cpn.
@Tiamo666 Yep, the bias has little influence to the result. However, it is better to avoid adding bias to conv2d
with batchnorm
.
from pytorch-cpn.
@Tiamo666 Here is what I did. You can modify the training codes like this(from line 38)
if args.resume:
if isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
checkpoint = torch.load(args.resume)
checkpoint_state_dict = checkpoint['state_dict']
new_dict = {}
for k,v in checkpoint_state_dict.items():
if k=='module.global_net.upsamples.0.1.bias':
continue
if k=='module.global_net.upsamples.1.1.bias':
continue
if k=='module.global_net.upsamples.2.1.bias':
continue
new_dict[k]=v
model.load_state_dict(new_dict)
args.start_epoch = checkpoint['epoch']
# optimizer.load_state_dict(checkpoint['optimizer'])
Using --resume
to continue training, and set --epochs
to one or two larger than the checkpoint you load.
(and also make sure to change the learning rate to a proper value. )
from pytorch-cpn.
Ok, That's cool, thanks a lot.
from pytorch-cpn.
Hello, David. I've upload the model of cpn384x288 with resnet101 On BaiduYun.
Here is the link:
https://pan.baidu.com/s/1e_meK3xnGRZXJEBaFVXB3A
from pytorch-cpn.
Cool. @Tiamo666 Could you please tell me the results you got before and after the fine-tune process(using gt bbox)?
from pytorch-cpn.
Hello, David, the results after fine-tune is
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.740
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.923
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.806
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.711
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.787
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.770
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.931
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.829
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.736
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.821
Before fine-tune is
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.075
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.154
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.063
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.100
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.043
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.084
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.165
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.073
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.109
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.049
from pytorch-cpn.
@Tiamo666 Thanks! I'm a little busy these days, I'll update the results and model soon.
from pytorch-cpn.
I've used the commit 8e85af2 to train ResNet50
+ 256x192
model with GT bbox input and default parameter setting from scratch when epoch
is set to 32 and the overall result 70.8
as below shown is slightly worse than the reported one 71.2
:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.708
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.905
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.782
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.683
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.749
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.740
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.918
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.804
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.710
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.786
How many epochs do you set to achieve 71.2
for ResNet50
+ 256x192
?
As for ResNet50
+ 384x288
model with GT bbox input and default parameter setting training from scratch, the epoch=32
result is slightly better than the reported 73.7
as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.741
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.925
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.805
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.706
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.795
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.768
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.932
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.825
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.730
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.826
from pytorch-cpn.
@mkocabas Sorry about that I have not updated the results yet. 71.2
is the old result.
It is strange that the results that after fixing bugs are lower than the results before. I'll update all the results this weekends, but I still do not figure out the reason. Maybe we need to adjust the parameter setting since this parameter setting is setting for the old codes.
from pytorch-cpn.
@GengDavid Now, the ResNet 50
+ 256x192
with detection GT bboxes is slightly worse than the old result, but the ResNet 50
+ 384x288
is slightly better than the old result.
from pytorch-cpn.
Cool, so I think it is allowable to have some slight differences. And could you provide your pre-trained ResNet 50 + 384x288
with us? It would be great.
from pytorch-cpn.
@GengDavid Please see my comments #3 (comment)
from pytorch-cpn.
Sorry, I don't clearly understand what you mean by referencing comment-424928303😳
from pytorch-cpn.
@GengDavid
Sorry, I misunderstand your comment.
The trained model for ResNet50
+ 384x288
can be found at GoogleDrive.
from pytorch-cpn.
Hi @Tiamo666 @mingloo
I've updated all the pre-trained models and results.
Sorry for taking a long time to update. Thanks for your great work!
from pytorch-cpn.
However, it is a little confusing that the CPN-101-384x288
model perform even worse than CPN-50-384x288
.
@Tiamo666 Could you show me the parameter setting you used to fine-tune the model? Thanks!
Have a good National Day.
from pytorch-cpn.
@GengDavid @Tiamo666 Thanks for updating the result.
I'll try to train CPN-ResNet101-384x288
from scratch on my side.
from pytorch-cpn.
@mingloo Great! Thanks.
from pytorch-cpn.
@GengDavid , Thanks a lot, I just come back from my holiday. I didn't change any other parameters, I just modified the learning rate scheduler with pytorch built-in package optim.lr_scheduler, here is my code:
fine tune
for k, v in pretrained_dict.items():
if k in ['module.global_net.upsamples.0.1.bias',
'module.global_net.upsamples.1.1.bias',
'module.global_net.upsamples.2.1.bias']:
continue
new_dict[k] = v
model.load_state_dict(new_dict)
adjust lr rate
scheduler = lr_scheduler.MultiStepLR(optimizer, milestones = cfg.lr_dec_epoch, gamma=cfg.lr_gamma)
for epoch in range(args.start_epoch, args.epochs):
#lr = adjust_learning_rate(optimizer, epoch, cfg.lr_dec_epoch, cfg.lr_gamma)
scheduler.step(epoch)
lr = optimizer.state_dict()['param_groups'][0]['lr']
print('\nEpoch: %d | LR: %.8f' % (epoch + 1, lr))
The following is part of my log.txt, I fine tuned from epoch32, and the total epoch is 35:
30.000000 0.000031 102.073177
31.000000 0.000016 101.399609
32.000000 0.000016 101.165480
33.000000 0.000016 101.801196
34.000000 0.000016 101.328027
35.000000 0.000016 101.059933
from pytorch-cpn.
I just test on the model of epoch35 with ground Truth, it seems to get a little higher performance:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.744
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.924
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.816
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.712
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.791
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.772
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.932
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.834
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.739
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.824
from pytorch-cpn.
@Tiamo666 Thanks! So the number of the epoch is the point.
from pytorch-cpn.
I've trained the CPN101-384x288
model from scratch. The model can be downloaded from GoogleDrive.
The evaluation result is as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.740
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.924
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.815
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.710
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.787
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.770
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.934
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.832
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.736
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.822
from pytorch-cpn.
@mingloo great job!
Could you please tell me that how many epoch did you take?
from pytorch-cpn.
@mingloo Thanks a lot, I got it.
from pytorch-cpn.
@Tiamo666
Sorry. I've double checked the CPN101-384x288
model that trained from scratch is using default parameter setting. So please ignore the previous #3 (comment).
from pytorch-cpn.
@mingloo Thanks a lot.
Wonder that have you tested trained model on different epochs or just the last epoch(32)?
from pytorch-cpn.
@GengDavid
What I've tested is all for epoch=32
.
from pytorch-cpn.
@GengDavid Hi, I have meet some problems about training....... Can you share your log file about ResNet 50+256x192? Thanks
from pytorch-cpn.
@YoungZiyu
Sure, you can find training log here
from pytorch-cpn.
@Tiamo666 @GengDavid
How to use the models to test one single image?
Is there any inference script?
from pytorch-cpn.
@GengDavid @aidarikako @mingloo
hello,why i got so large loss like:
Total params: 104.55MB
Epoch: 1 | LR: 0.00050000
iteration 100 | loss: 362.8368835449219, global loss: 246.98593711853027, refine loss: 115.85093688964844, avg loss: 403.03418150042546
i has changed lr=1e-6,but not helps.
any advice?tks
from pytorch-cpn.
@GengDavid @mkocabas @Tiamo666 @mingloo @YoungZiyu
hello,why i got so large loss like:
Total params: 104.55MB
Epoch: 1 | LR: 0.00050000
iteration 100 | loss: 362.8368835449219, global loss: 246.98593711853027, refine loss: 115.85093688964844, avg loss: 403.03418150042546
i has changed lr=1e-6,but not helps.
any advice?tks
from pytorch-cpn.
Related Issues (20)
- about the cpu utilized percent HOT 2
- About the utils/imutils.py line:41 HOT 1
- a question about test.py
- Unable to extract pretrained model archive HOT 1
- About the structure of refineNet
- mobilenet is not fast
- Using just GlobalNet
- nn.Upsampling( ) and pytorch version HOT 1
- where is the file of "COCO_2017_train.json" ,"COCO_2017_val.json", "val_dets.json"? i can't find them in the coco dataset HOT 1
- Results is
- About mscocoMulti.py
- some question about the human detector.
- test.py gets stuck when computing output HOT 1
- half of the output predictions are wrong HOT 2
- pre-trained model
- Config.py
- RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1
- Yeet
- Ye
- How can get the high score? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-cpn.