GithubHelp home page GithubHelp logo

anson0910 / cnn_face_detection Goto Github PK

View Code? Open in Web Editor NEW
253.0 28.0 149.0 12.18 MB

Implementation based on the paper Li et al., “A Convolutional Neural Network Cascade for Face Detection, ” 2015 CVPR

Python 100.00%
caffe face-detection convolutional-neural-networks cascade neural-networks deep-learning machine-learning

cnn_face_detection's Introduction

Implementation based on the paper Li et al., “A Convolutional Neural Network Cascade for Face Detection, ” 2015 CVPR

A few modifications to the paper:

  1. Multi-resolution is not used for simplicity, you can add them in the .prototxt files under CNN_face_detection_models to do so.
  2. 12-net is turned into fully convolutional neural network to reduce computation.
  3. I took out the normalization layers out of the deploy.prototxt files in 48-net and 48-calibration-net, because of convenience for me implementing them in hardware, you can just simply at them back as in the corresponding train_val.prototxt files.

In order to test CNN Cascade:

Detection scripts are stored under CNN_face_detection/face_detection directory, and models can be found in CNN_face_detection_models repository.

For testing single image, use script face_cascade_fullconv_single_crop_single_image.py
For benchmarking on FDDB, use script face_cascade_fullconv_fddb.py

If you're not familiar with caffe's flow yet, dennis-chen's reply here gives a great picture.

In order to train CNN Cascade:

  1. You should first download all faces from the AFLW dataset, and at least 3000 images without any faces (negative images).
  2. Create negative patches by running face_preprocess_10kUS/create_negative.py with data_base_dir modified to the folder containing the negative images.
  3. Create positive patches by running face_preprocess_10kUS/aflw.py
  4. Run face_preprocess_10kUS/shuffle_write_positives.py and face_preprocess_10kUS/shuffle_write_negatives.py to shuffle and write position and labels of images to file.
  5. Run face_preprocess_10kUS/write_train_val.py to create train.txt, val.txt and move images to corresponding folders as caffe requires.
  6. Use scripts in CNN_face_detection_models/create_lmdb_scripts/ to create lmdb files as caffe requires.
  7. Start training by using such commands in terminal.
    ./build/tools/caffe train --solver=models/face_12c/solver.prototxt

24 net and 48 net can be created in a similar way, however negative images shoud be created by running face_preprocess_10kUS/create_negative_24c.py and face_preprocess_10kUS/create_negative_48c.py

Calibration nets are also trained similarly, scripts can be found in face_calibration/

cnn_face_detection's People

Contributors

anson0910 avatar rac-taipeisouth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn_face_detection's Issues

about the negative data

sorry to trouble,why train the 24 and 48 detect net should use the negative data from the former detect net(the rectangles from the former result) ,not the negative data same as 12 directly?

About the training step

Dear anson:
I have ran you code before and now I'd like to train my own caffee model.
But I don't understand the training step. For example, I don't know what training set I should use. According to my understanding, I should use 12*12 pixels image when I training the 12net. But in you code create_negative.py I find that you crop image to different size. I don't understand why you do so.
So I do hope you can answer my question if convenient. Thank you very much!

can't find the .prototxt files

hi!

Thanks for you implementation of the cascade cnn. You are doing a great job, and i learn a lot from it.
However, i can't find some files you mentioned in the readme, such as CNN_face_detection_models/create_lmdb_scripts/ models/face_12c/solver.prototxt.

Can you please tell me what happened? can i get that files?

thanks lot!

Best regards
Xing Wang

About the quantizeBitNum and stochasticRoundedParams

You have help me so much,if you come to shanghai, remember to call me I will treat you dinner.
1.In your code,I see these two things(I do not know how to call them),what are they used to do ? Is there any difference?
2.what are the difference between the following choose?
if loadNet:
net_12_cal = caffe.Net(MODEL_FILE, PRETRAINED, caffe.TEST)
else:
net_12_cal = caffe.Classifier(MODEL_FILE, PRETRAINED,
mean=np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(15, 15))

AFLW new website can't find AFLW_Faces.txt

So, I emailed the AFLW guys, and they sent me the password, and I downloaded the stuff. However, in your aflw.py file where we are supposed to train with the AFLW dataset, at the top of the file we have: read_file_name_faces = "/home/anson/face_pictures/AFLW/AFLW_Faces.txt" I looked through the downloads and couldn't find this AFLW_Faces.txt file anywhere. The same holds for AFLW_Rect.txt and AFLW_sex.txt. I was having a lot of trouble installing and using sqliteman and the other things that AFLW suggested, and was hoping to forgo that stuff and just go straight to taking the files I downloaded, putting them in the correct directory, and training the CNN. So my question is basically: is it vital that I get sqliteman working and do all that stuff, or should the files I named above be somewhere in the tar folders I downloaded from their website?

calibrate-12 can not converge when trainning, while net-12 converged very easily

hi, @anson0910
I collect about 20000 faces from AFLW and 60000 backgroud patch for net-12 training.it converged very shortly after about 10000 iteration.
However, calibrate-12 can not converge after 100000 iteration using about 2000 faces. each faces generate 45 training patches according to original paper. As a result, there are about 2000 * 45 = 100000 training samples to train calibrate-12.
I have no clue about the problem. Can you give me some advice?

About the test result

Sorry to bother you. I have run the test code of yours. But the output image has many useless boxes. I have increased the threshold and confidence. Please help me~Thx a lot!!!

About training

hi,
I'm training the 6 nets using caffe. I find that my nets converge very fast. The 12-net, for example, it starts at loss: 0.69 and acc:0.68, however after 1000 iteration, the loss gets 0.01 and the acc gets 0.994, then they only changes in very small range.
Does it mean that the net meet overfitting problem?
I test my net, I found that it doesn't filter non-face region as much as expected.
How does it happen?
Thanks

many false face

hello ,dear anson ,I use the git to detect my test images ,but has many false face rectangles , Can you help me to deal with it ? THX

Failed to test using deploy.prototxt of net-12

Hi, @anson0910
I trained net-12 model using CNN_face_detection_models/face_12c/train_val.prototxt.
And I load this model with CNN_face_detection_models/face_12c/deploy.prototxt
After that, I call detect_face_12c_net in CNN_face_detection/face_detection/face_detection_functions.py. It throwed error like "inner_product_layer.cpp:64] Check failed: K_ == new_K (400 vs. 605472) Input size incompatible with inner product parameters."
I think it was caused by input image size.
In face_detection_functions.py, the original test image are resized by multi-scale, which are larger than 3_12_12. But deploy.prototxt requires 3_12_12 image as input. It seems that caffe didnt do any sliding-window job automatically

create negative_py

Hi!
when I am trying to run create negative_py, I get the error:
read_img_name = data_base_dir + '/' + file_list[current_image].strip()
IndexError: list index out of range
can u help me with this please?
I have just created 340 scenery images in the directory
btw, I am not a pro. just a beginner
thanks a lot!

AFLW files

Hi
do you have any script to extact AFLW from sql database?
I mean following files?
AFLW_Faces.txt
AFLW_Rect.txt
AFLW_sex.txt

Second question, do you need "Sex" for Face Detection or it is for other purpose?

Thanks

A few question sorry to trouble

sorry to trouble you again,I have some questions about the code and the paper
1.current_rectangle = [int(2_current_x_current_scale), int(2_current_y_current_scale),
int(2_current_x_current_scale + net_kind_current_scale),
int(2_current_y_current_scale + net_kind_current_scale),
confidence, current_scale]
what is the meaning of 2 in the code(2_current_x_current_scale)?

in the paper it says built into image pyramid to cover faces at different scales ,in your code the image is only been narrowed without enlarge,is it right?

in the paper densely scan image of size 800 × 600 for 40 × 40 faces with 4-pixel spacing, which generates 2, 494 detection windows. The time reduces to 10 ms on a GPU card, most of which is
overhead in data preparation.Can you tell you how the 2, 494 is been calculated?

in 12 net it says 12 × 12 detection windows,is it because the net input is 12*12 so the window is 12?

is 4-pixel spacing corresponding to the train_val.prototxt and how the 4 is been calculated?

thank you for help me so much and are you chinese?

3000 images without any faces (negative images)

dear anson0910:
I am so appreciated your work. I have a question about the 3000 images. Is there a trick to select these negative images? for example: 1. Keep various 2. use the residue part of a image with moving faces out (AFLW dataset) 3. just use some scenery picture?
Look forward your reply.

About the result after running

Hello, I downloaded you project and ran in my computer. Your work helps me a lot.
A problem is that my result is shown as follow:
2

Of course it's not good enough. I wonder it is because of my incorrect operation, or there are some bugs in your code/model? What is the result you run in you own environment? I never change your code or your caffee model.
Hope you would answer my question. Thanks a lot.

About the face size in create_face_12c.sh

Dear anson,
I've read you code in create_face_12c.sh. I found that you resize the image to 60 * 60. I don't understand why you do so. And by the way why do you set the RESIZE_HEITHG and RESIZE_WIDTH both to be 15?
Hope for your answering. Thank you very much!

A question about the cascade cnn

Dear anson,
I'm sorry for trouble you again. I find that in your code, the part of detecting face, your 24 net just use the result of 12net directly. But in the paper I find that the 24net and 12net are only connected in fully-connected layer, so as 48net and 24net.
So please tell me whether I have a wrong understanding if convient.
Thank you for your help!

hello,a few questions.

Itś really nice of you to share your code,but i do not know where to start at first.Can you show me how to learn your code or to read which folder first?

ROC report on supplied models

Hello, Anson.

First of all thank you for sharing your code. It was very helpful to me.

I ran benchmark on FDDB and got not very impressive results:
image
Does it mean that supplied models are only for demo and one should train its from scratch? Can I hope to obtain results similar to #18 in such way?

Thanks in advance.

Speed Problem

I am running your method by using face_cascade_fullconv_fddb.py under GPU K80. The speed is quite quite slow.

Processing file 1 ...
Processing image : 0
Processing image : 10
Processing image : 20
Processing image : 30
Processing image : 40
Processing image : 50
Processing image : 60
Processing image : 70
Processing image : 80
Processing image : 90
Processing image : 100
Processing image : 110
Processing image : 120
Processing image : 130
Processing image : 140
Processing image : 150
Processing image : 160
Processing image : 170
Processing image : 180
Processing image : 190
Processing image : 200
Processing image : 210
Processing image : 220
Processing image : 230
Processing image : 240
Processing image : 250
Processing image : 260
Processing image : 270
Processing image : 280
Average time spent on one image : 11.2854675207 s

Do you know the reason?

The evaluate result

Could you demonstrate the evaluate result on FDDB?(The recall rate or discontinuous ROC curve)

Resize images when creatiing LMDB file

Hi anson.
when i was creating LMDB files required to train all nets i was mentioned image resize to 256 for width and height and i created LMDB files. is that causes accuracy changes when i will test model. if it is what are the changes i need to do for getting the better result.

Thanks&Regards
N G K Sai

How to train calibration nets?

Dear anson,
I've trained a detection net use my own data. Now I want to train calibration nets. But I don't know how to do so. What training set should I use ? And how should I mark the label?
Hope you will answer my question if convient. Thank you very much!

approximate Threshold T1 and T2

Hi, I'm trying to reproduce the results in the paper. Can you share about the approximate threshold t1 and t2 in training steps and approximate negative samples in training cali24 and cali48 net? Also, have you tested on AFW ? I tried a lot methods but best AP is just about 90%. Thanks!

About the training step

Dear anson:
I have ran you code before and now I'd like to train my own caffee model.
But I don't understand the training step. For example,I do not know whether the six networks can train at the same time. If training alone, where to get the input of net_12_cal, net_24c ? What is the difference between the face_12c and face_12c2 folder model? What are the results of * _SRquantize _ *. Caffemodel and * _quantize _ *. Caffemodel preserveing? After 400000 iterations, accuracy = 0.5, loss = 0.64 almost unchanged by training 12-net alone, what should be the result of this result.
So I do hope you can answer my question if convenient. Thank you very much!

train_val.prototxt about FCN

@anson0910
Hi, can you provide train_val.prototxt file for training face12c-FCN?
I only found CNN_face_detection_models / face_12c / face12c_full_conv.prototxt. It's used to test.
And I don't know how to write the train_val.prototxt of FCN. Forgive me..

Thank you very much!

where is the positive_1 ~ positive_15

hi @anson0910 ,
I noticed that the positive samples are generated by aflw.py. Right? However this script only output positive_16 ~ positive_17, rather than 1-15.
So I wonder what script positive 1-15 are generated from?

How to get the file face12c_full_conv.caffemodel

Dear Anson,
I have used face_cascade_fullconv_single_crop_single_image.py with files face12c_full_conv.prototxt and face12c_full_conv.caffemodel to detect the face in a image, it works OK.
Then I used face_12c/solver.prototxt and face_12c/train_val.prototxt to train the model file face_12c_train_iter_400000.caffemodel(accuracy=0.882).
But I detected many faces everywhere in the image with the model file face_12c_train_iter_400000.caffemodel. Should I and how to get face12c_full_conv.caffemodel from face_12c_train_iter_400000.caffemodel?
The file size of face12c_full_conv.caffemodel is 28038 bytes, but the file size of face_12c_train_iter_400000.caffemodel I trained is 28248 bytes.

Thanks

Question about calibration data

hi~
In your code, when you prepare calibration data, you just use the formula in the paper. For example, when the label is 0, s=0.83, x=-0.17, y=-0.17, the window will go right and down.
When you detect a face, the current window is a little bit right and down to the correct window, then you apply calibration net, and the label is 0. Obviously you can not take the same parameter 0.83,-0.17,-0.17 because it will go right and down further.

How to python caffe to test the model

I use happynear's caffe flow and build it under vs2015, cuda 8.0, anaconda2-4.3.1 and no cudnn. I have successed building caffe with python wrapper, but I dont know how to use our codes and model to do a face detection test for I am the first time using python codes. Can anyone give me some pipe or tips to continue the experiment. Thanks a lot.

Question about nms algorithm

hi~ sorry to trouble again~
I have a question about the nms algorithm. In your face_detection_functions.py, the "globalNMS" function has a condition at line 126, "result_rectangles[cur_rect_to_compare][5] < 0.85", which means the scale should be less than 0.85. However if you detect faces with min size 32, the scale cannot be less than 32/12. So this condition can never be true.
So what does this condition for in this function?

ROC report:

see ROC
It is tested on FDDB. Trained with 20K positive samples from AFLW and 60K backgroud images

24-net and 48-net

in 24-net , did you not include 12-net full connected layer?and same in 48-net, did you not include 24-net full connected layer?

about 48net and its deploy

hi, why there are differences between 48net's train_val.prototxt and its deploy.prototxt?
for example, no "norm layer" in deploy and a "dropout layer" which is not in train_val.prototxt below "relu3 layer"

How to train multi-resolution 24-net and 48-net?

hi @anson0910 ,
You said "Multi-resolution is not used for simplicity, you can add them in the .prototxt files under CNN_face_detection_models to do so" in readme. It seems that it is very easy to add this feature. Would you please to explain it more clear?

concerning lmdb

hello
I have read the paper . the input image size of the net12 is 12 ☓ 12.
however ,in your code of create_lmdb_scripts->face_12c->create_face_12c.sh,you resized the image into the size of 15. One the other hand, in the face_12c->deploy.prototxt file, the input dim is 3☓12☓12. could you tell me the reason?Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.