i don't understand exactly approximate joint training method. i know RPN and detec

step by step understanding approximate joint training method #192 about simple-faster-rcnn-pytorch HOT 5 OPEN

sanhai77 commented on June 16, 2024

step by step understanding approximate joint training method #192

from simple-faster-rcnn-pytorch.

Comments (5)

m-evdokimov commented on June 16, 2024

In approximate joint training method you train both rpn and the detection head simultaneously. The point is that you don't pass gradients from the detection head to rpn.
In that case you need to detach an output of rpn from a computational graph (simply rpn_output.detach() in pytorch) and pass in to the detection head. If you don't detach the output it becomes non-approximate joint training method.

from simple-faster-rcnn-pytorch.

sanhai77 commented on June 16, 2024

ok, we use rpn_output.detach(). but why?
is it possible to derivate roi() w.r.t the coordinate?

d(roi(feature_map , Rois))/d{x1,y1,x2,y2} = exist?

i mean the crooping part of the roi pool.

d(feature_map [x1:x2 , y1:y2])/d{x1,y1,x2,y2} = exist?

from simple-faster-rcnn-pytorch.

m-evdokimov commented on June 16, 2024

ok, we use rpn_output.detach(). but why?

If rpn output is detached you don't propagate gradients from the detection head to rpn. In that way the detection head is just a function of crops (but not a whole input image and anchor boxes parameters), this is what the approximate joint method does. You can think about it as if you take your image dataset, extract and cache crops made by rpn once and then train the detection head on them.

is it possible to derivate roi() w.r.t the coordinate?

Yes, it's possible. The main reason, why the detection head and rpn in the paper were trained "separately" is lack of computational resources i assume.
Nowadays we can train all parts of such models at the same time, which is intuitively better.

from simple-faster-rcnn-pytorch.

sanhai77 commented on June 16, 2024

I apologize for my many question. but i am confused and i cant give my answer during any research.
but roi pooling involves non-differentiable operations like indexing(quantizing the coordinate(like 3.5) to integers(3)). However why we detaching the proposals, during backpropagation. how the gradients do flow from the detector back into the RPN and feature extraction network? i dont uderstand this is unnecessary detaching proposal when gradients cant be flowing from roi pooling layer to rpn head and automatically are stoped.
on other hand unlike roi align, outputs of roi pooling has not directly related with coordinates(proposals). (Actually, I did not find a
mathematically relation between roi_output and inputs(coordinates).)
i.e mathematically relation beetwen roi-pool outputs and{x1,y1,x2,y2}.
So again is not necessary detaching proposal when there is not relationship beetwen roi pooling output and coordinate inputs.
if d(roi_pool_outputs)/d{x1,y1,x2,y2} are not even exist why we should detach the {x1,y1,x2,y2} to become constant??

i realy confused.

from simple-faster-rcnn-pytorch.

m-evdokimov commented on June 16, 2024

The trick is that in joint training method you don't get derivatives wrt coordinates from rpn.

Actually there are two ways to train faster rcnn:
a) Train rpn and the detection head in separate way. Going back to the days when people mostly don't have enough computational resources to train both parts in parallel, the recipe was simple: train single rpn, then from training data you extract crops, predicted by pretrained rpn. On the extracted crops you finally train the detection head. The method when you detach the rpn output is just the way to simulate separate training of the both parts in a single forward-backward step.
b) Train all parts of the model at the same time. In that method the detection head output becomes a function of the input image (comparing to the method a, where you have two separate functions wrt to the input image and crops of the feature map). In a part of the model where you make crops from the rpn output you don't take gradients wrt to the coordinates of the crops. You can think about this operation as a simple element-wise multiplication of feature map and a binary mask where 1 represents the pixels of the crop. This trick makes gradient flow from the detection head to the rpn.

from simple-faster-rcnn-pytorch.

step by step understanding approximate joint training method #192 about simple-faster-rcnn-pytorch HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs