sorry to trouble you again,I have some questions about the code and the paper 1.cu

Hi, The 2 corresponds to the pooling layer of the first 12 net

A few question sorry to trouble about cnn_face_detection HOT 3 CLOSED

anson0910 commented on June 2, 2024

A few question sorry to trouble

from cnn_face_detection.

Comments (3)

anson0910 commented on June 2, 2024

Hi,

The 2 corresponds to the pooling layer of the first 12 net, since the pooling layer scales the input image down by a factor of 2.

Yes, narrowing the image means finding larger faces, since if you want to find smaller faces, you can just decrease the min_face_size argument of the detect_faces_net function.

The number of detection windows generated can be calculated as follows:
We wish to find 40 × 40 faces, so we first downscale the original image by a factor of 12/40, which results in an image of size 240 x 180, generating ((240 - 40) / 4 + 1) * ((180 - 40) / 4 + 1) = 1836 windows, and depending on the resizing factor for creating the pyramid, the number of detection windows may vary.

In 12 net it says 12 × 12 detection windows,is it because the net input is 12*12 so the window is 12?
Yes

is 4-pixel spacing corresponding to the train_val.prototxt and how the 4 is been calculated?
The spacing can be any value, because according to the description of the original paper, crops are taken out of the image and fed into the 12-net one at a time.
However, I have modified the 12-net to be a fully convolutional neural network, such that much redundant work can be saved.
You can take a look at this link if you're interested.

No welcome, I'm from Taiwan!

from cnn_face_detection.

tangyudi commented on June 2, 2024

Thank you and why the factor is calculated by 12/40,what is the meaning of the factor?
I put a 466 x 699 image to the network after resize_image it is 139 x209 and the (out = net_12c_full_conv.blobs['prob'].data[0][1, :, :]) out.shape is 64*99, is each confidence point in 64 x99 means the possibility of a face and if so why a point can represent a rectangle?
In the paper it says the image pyramid is resized by 12/F ,is it means w x12/F,h x12/F?
I use 1W face image and 1W background image without face to train the 12net, is it enough?

from cnn_face_detection.

anson0910 commented on June 2, 2024

The factor resizes the image such that after resizing, each 12 x 12 block corresponds to min_face_size x min_face_size block in original image.

why a point can represent a rectangle?
A point in the last output feature map represents a 12 x 12 block in resized image, which in turn corresponds to min_face_size x min_face_size block in original image.

In the paper it says the image pyramid is resized by 12/F ,is it means w x12/F,h x12/F?
Yes.

I use 1W face image and 1W background image without face to train the 12net, is it enough?
Yes, I think this is enough.

from cnn_face_detection.

A few question sorry to trouble about cnn_face_detection HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs