GithubHelp home page GithubHelp logo

Comments (3)

anson0910 avatar anson0910 commented on June 2, 2024

Hi,

  1. The 2 corresponds to the pooling layer of the first 12 net, since the pooling layer scales the input image down by a factor of 2.

Yes, narrowing the image means finding larger faces, since if you want to find smaller faces, you can just decrease the min_face_size argument of the detect_faces_net function.

The number of detection windows generated can be calculated as follows:
We wish to find 40 × 40 faces, so we first downscale the original image by a factor of 12/40, which results in an image of size 240 x 180, generating ((240 - 40) / 4 + 1) * ((180 - 40) / 4 + 1) = 1836 windows, and depending on the resizing factor for creating the pyramid, the number of detection windows may vary.

In 12 net it says 12 × 12 detection windows,is it because the net input is 12*12 so the window is 12?
Yes

is 4-pixel spacing corresponding to the train_val.prototxt and how the 4 is been calculated?
The spacing can be any value, because according to the description of the original paper, crops are taken out of the image and fed into the 12-net one at a time.
However, I have modified the 12-net to be a fully convolutional neural network, such that much redundant work can be saved.
You can take a look at this link if you're interested.

No welcome, I'm from Taiwan!

from cnn_face_detection.

tangyudi avatar tangyudi commented on June 2, 2024

Thank you and why the factor is calculated by 12/40,what is the meaning of the factor?
I put a 466 x 699 image to the network after resize_image it is 139 x209 and the (out = net_12c_full_conv.blobs['prob'].data[0][1, :, :]) out.shape is 64*99, is each confidence point in 64 x99 means the possibility of a face and if so why a point can represent a rectangle?
In the paper it says the image pyramid is resized by 12/F ,is it means w x12/F,h x12/F?
I use 1W face image and 1W background image without face to train the 12net, is it enough?

from cnn_face_detection.

anson0910 avatar anson0910 commented on June 2, 2024

The factor resizes the image such that after resizing, each 12 x 12 block corresponds to min_face_size x min_face_size block in original image.

why a point can represent a rectangle?
A point in the last output feature map represents a 12 x 12 block in resized image, which in turn corresponds to min_face_size x min_face_size block in original image.

In the paper it says the image pyramid is resized by 12/F ,is it means w x12/F,h x12/F?
Yes.

I use 1W face image and 1W background image without face to train the 12net, is it enough?
Yes, I think this is enough.

from cnn_face_detection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.