hi: In caffe, loss will be averaged by iter_size (like batch training). Will the

Do i need to modify the learning rate when several gpus exploited? about py-r-fcn-multigpu HOT 12 CLOSED

bharatsingh430 commented on May 24, 2024

Do i need to modify the learning rate when several gpus exploited?

from py-r-fcn-multigpu.

Comments (12)

bharatsingh430 commented on May 24, 2024

you need to increase learning rate when you increase number of gpus

from py-r-fcn-multigpu.

zengarden commented on May 24, 2024

thx. More concrete situation, if i use 8gpus, lr should be 8x compared with 1gpu (same iter_size) ?

from py-r-fcn-multigpu.

bharatsingh430 commented on May 24, 2024

that worked for me, but may not always be true

from py-r-fcn-multigpu.

zengarden commented on May 24, 2024

got it. In your coco branch, it seems that lr is still set to 1e-3 for training, while the stepsize have been set to 90000. I mean the settings in models/coco/ResNet-101/rfcn_end2end/solver_ohem.prototxt.

from py-r-fcn-multigpu.

bharatsingh430 commented on May 24, 2024

I just created this repo for multi-gpu training and it was meant for 2 GPUs with 1 iter_size on PASCAL. But I suppose, step down would be too early for coco for that. Probably I did not optimize parameters for coco when I created this repo.

The soft-nms repo contains the training schedule for ms-coco which gets 35.1 mAP, where lr is set to 0.008. But again, its dataset specific and specific to 8 GPUs.

from py-r-fcn-multigpu.

bharatsingh430 commented on May 24, 2024

I'll update this repo also in a month or so, so that master has all the features.

from py-r-fcn-multigpu.

zengarden commented on May 24, 2024

awesome soft-nms repo. R-FCN in this repo got 30.8%, while soft-nms repo got 33.9%. i see that one difference between them is test set. COCO 2014 vs 2015 minival (but i think 2015 minival is same as 2014minival). and another difference is psroipooling. soft-nms use align psroipooling(proposed in mask-rcnn). does align pspooling improve 3.1%? i would like to reproduce the results given in soft-nms.

from py-r-fcn-multigpu.

bharatsingh430 commented on May 24, 2024

It is not completely due to mask-rcnn's roi align. I implemented what I could understand from the paper and I was seeing around 1% improvement by fixing the alignment issue. I also reduced the RPN min size from 32 to 16. Training was done till 160k iterations. Probably training longer would help more. In my experience, test-dev gives 0.2% more for R-FCN, so you should get 35.3 on test-dev.

from py-r-fcn-multigpu.

zengarden commented on May 24, 2024

thanks a lot.

from py-r-fcn-multigpu.

zengarden commented on May 24, 2024

I will try to reproduce soft-nms experiments.

from py-r-fcn-multigpu.

foralliance commented on May 24, 2024

@bharatsingh430
@zengarden

I also reduced the RPN min size from 32 to 16,
does this refer to the parameters __C.TRAIN.RPN_MIN_SIZE and __C.TEST.RPN_MIN_SIZE ?
It looks like it went from 16 to 8, not from 32 to 16.

Am i right?

from py-r-fcn-multigpu.

bharatsingh430 commented on May 24, 2024

yes

from py-r-fcn-multigpu.

Do i need to modify the learning rate when several gpus exploited? about py-r-fcn-multigpu HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs