duinodu / mtcnn Goto Github PK
View Code? Open in Web Editor NEWmtcnn in python
License: MIT License
mtcnn in python
License: MIT License
excuse me , what are the thresholds of the models when operating ''face detection''?
(I mean the thresholds that generate the curves of FDDB and WIDERface in the author's papaer)
thanks!
In function generateBoundingBox, why plus 1 in "bb1 = np.fix((stride * (boundingbox) + 1) / scale).T"? Thank you in advance.
你好。首先很感谢你把MTCNN python化。但我在阅读你的代码的时候,我发现有句代码是PNet.blobs['data'].reshape(1,3,ws,hs); 正常来讲,不应该是reshape(1,3, hs, ws),请问这是什么原因导致??
Please provide training code...
how to crop the face
Hi @DuinoDu ,
I need a Python version of the following matlab routine, which is provided by Yandong Wen to perform a 2-D face alignment using the 5 facial points obtained by MTCNN:
% load face image, and align to 112 X 96
imgSize = [112, 96];
coord5points = [30.2946, 65.5318, 48.0252, 33.5493, 62.7299; ...
51.6963, 51.5014, 71.7366, 92.3655, 92.2041];
image = imread('path_to_image/Jennifer_Aniston_0016.jpg');
facial5points = [105.8306, 147.9323, 121.3533, 106.1169, 144.3622; ...
109.8005, 112.5533, 139.1172, 155.6359, 156.3451];
Tfm = cp2tform(facial5points', coord5points', 'similarity');
cropImg = imtransform(image, Tfm, 'XData', [1 imgSize(2)],...
'YData', [1 imgSize(1)], 'Size', imgSize);
I have already tried some code with OpenCV and skimage, using the points obtained by your code; but no success:
import numpy as np
import cv2
imgSize = (112, 96)
# given a dimension size, calculate the center coordinate
calc_center_coord = lambda x: np.float32(x-1)/2 if x % 2 == 0 else np.float32((x-1)/2)
# calculate normalized coordinates
calc_norm_coord = lambda x,center,scale: (x-center)/scale
x_ = [30.2946, 65.5318, 48.0252, 33.5493, 62.7299]
y_ = [51.6963, 51.5014, 71.7366, 92.3655, 92.2041]
xc, yc = calc_center_coord(imgSize[1]), calc_center_coord(imgSize[0])
x_norm = [calc_norm_coord(x, xc, imgSize[1]) for x in x_]
y_norm = [calc_norm_coord(y, yc, imgSize[0]) for y in y_]
src = np.array( zip(x_norm,y_norm) ).astype(np.float32).reshape(1,5,2)
w, h = img.shape[1], img.shape[0]
img_c_x, img_c_y = calc_center_coord(img.shape[1]), calc_center_coord(img.shape[0])
# there might be more than one faces, hence
# multiple sets of points
for pset in points:
img2 = img.copy()
pset_x = pset[0:5]
pset_y = pset[5:10]
pset_norm_x = [calc_norm_coord(x,img_c_x,imgSize[1]) for x in pset_x]
pset_norm_y = [calc_norm_coord(y,img_c_y,imgSize[0]) for y in pset_y]
dst = np.array( zip(pset_norm_x,pset_norm_y) ).astype(np.float32).reshape(1,5,2)
transmat = cv2.estimateRigidTransform( src, dst, False )
out = cv2.warpAffine(img2, transmat, (w, h))
cv2.imshow("result", out)
cv2.waitKey(0)
I have used both version of python and matlab. But unfortunately, I found that the results of these are not same. I even found that some image can be detected using matlab but not python. Could you please help me solve this problem? Thank you very much.
As presented in the article "http://daily.zhihu.com/story/9684615?utm_campaign=in_app_share&utm_medium=iOS&utm_source=weixin", 72 points of the face landmarks were selected and generated by the model.
Thus, is it available to config the setting the O-net model to get it. How to make it work for more landmarks detection. Thanks in advance.
如题~
Just like the question, Thx !
Hi. I am running into a Gtk-CRITICAL error (see below). The demo.py code works on my Raspberry Pi 3 (Raspbian Jessie), but is failing on my Nvidia TX1. Could this be a GTK or OPENCV (3.3) version issue? Any ideas???
nvidia@tegra-ubuntu:~/cviz/mtcnn$ uname -a
Linux tegra-ubuntu 4.4.38-jetsonbot-v0.2 #4 SMP PREEMPT Thu Sep 21 17:08:48 EDT 2017 aarch64 aarch64 aarch64 GNU/Linux
nvidia@tegra-ubuntu:~/cviz/mtcnn$ ./run
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0928 15:17:58.194954 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:17:58.194993 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:17:58.195003 4438 _caffe.cpp:142] Net('./model/det1.prototxt', 1, weights='./model/det1.caffemodel')
W0928 15:17:58.199648 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
W0928 15:18:07.621656 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:18:07.621706 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:18:07.621713 4438 _caffe.cpp:142] Net('./model/det2.prototxt', 1, weights='./model/det2.caffemodel')
W0928 15:18:07.626457 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
W0928 15:18:07.734647 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:18:07.734678 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:18:07.734688 4438 _caffe.cpp:142] Net('./model/det3.prototxt', 1, weights='./model/det3.caffemodel')
W0928 15:18:07.735986 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
audrey2.png
[1]: 26
[2]: 26
[4]: 26
[4.5]: 26
[5]: 2
[6]: 1
[7]: 1
[8]: 1
2: (1, 5)
[9]: 1
[10]: 1
[11]: 1
3: (1, 5)
./demo.py:566: Warning: specified class size for type 'CvImageWidget' is smaller than the parent type's 'GtkWidget' class size
cv2.imshow('img', img)
(demo.py:4438): Gtk-CRITICAL **: gtk_widget_new: assertion 'g_type_is_a (type, GTK_TYPE_WIDGET)' failed
./run: line 1: 4438 Segmentation fault GLOG_minloglevel=1 ./demo.py
nnvidia@tegra-ubuntu:~/cviz/mtcnn$ dpkg -l libgtk2.0-0 libgtk-3-0
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
ii libgtk-3-0:arm64 3.18.9-1ubuntu3 arm64 GTK+ graphical user interface library
ii libgtk2.0-0:arm64 2.24.30-1ubuntu1.16.04.2 arm64 GTK+ graphical user interface library
Hello, your demo detects one people, when I want to detection many people, so how do it?
I'm interested in this project. Would you consider adding a licence or some description of the terms of use?
Hi,
Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.
The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the following line in the code:
https://github.com/DuinoDu/mtcnn/blob/master/demo.py#L268
For reference I have printed out the original size and the rescaled size:
ORIGINAL Height: 340
ORIGINAL Width: 151
SCALE USED (were computed before): 0.107493555074
RESCALED Height: 37
RESCALED Width: 17
The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.
# Code file: det1.prototxt
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12
What I don't understand is where is the size going from size of image to 12x12?
demo.py中第447行,如下
w = total_boxes[:,3] - total_boxes[:,1] + 1
h = total_boxes[:,2] - total_boxes[:,0] + 1
w、h计算错了,需对调,虽然结果一般只差1个像素。
原作者matlab代码是
w=total_boxes(:,3)-total_boxes(:,1)+1;
h=total_boxes(:,4)-total_boxes(:,2)+1;
对应python代码应该是
w = total_boxes[:,2] - total_boxes[:,0] + 1
h = total_boxes[:,3] - total_boxes[:,1] + 1
I am trying to run MTCNN using python. When I run the code, it gives me an error related to missing error.txt file. From where will I get this file?
solved.
你好!我想问下作者提供的pr图是怎么作出来的,我用o-net输出的bbox和score 拿到widerface 的eval-tools上运行,效果非常差……所以应该怎样选取bbox和score
Hi,
I would like to know whether it can be used without matlab_wrapper, so that MATLAB is not necessary to install.
Thanks.
I have seen your code in demo.py. I notice that you don't resize the picture to 12X12 in P-Net. And I debug your code ,and I found the picture that is not 12X12X3. so the bounding box is not 1X1X4 in P-Net.It don't match the paper
Could you please provide details on how to train the whole network?
Hi @DuinoDu ,
in some cases, face bounding box has 5 components, whereas it should be 4.
For instance,
result : [ 81.67765533 68.74128327 167.86444914 183.59577323 0.99977213]
result: [ 63.16729596 66.6135435 155.45199138 191.26408356 0.99999678]
what do you think about this?
hello, i have tried demo.py with my testing pic.
the speed is about 200ms/pic (1200*800), is this normal?
and what can i do to speed it up so that i can detect faces in realtime?
thanks
The demo.py script completes in CPU mode, but I am running into a CUDNN error when the script is running in GPU mode. The first image is processed successfully, but when I close the first image so the second image can be processed, the CUDNN error is generated. Could this be a config issue?
caffe.set_mode_gpu();
caffe.set_device(0)
Thanks.
nvidia@tegra-ubuntu:~/caffe/tools$ caffe device_query
I1012 16:42:30.237367 1105 caffe.cpp:470] This is NVCaffe 0.16.4 started at Thu Oct 12 16:42:29 2017
I1012 16:42:30.243551 1105 caffe.cpp:473] CuDNN version: 6021
I1012 16:42:30.243568 1105 caffe.cpp:474] CuBLAS version: 8000
I1012 16:42:30.243578 1105 caffe.cpp:475] CUDA version: 8000
I1012 16:42:30.243589 1105 caffe.cpp:476] CUDA driver version: 8000
I1012 16:42:30.243602 1105 caffe.cpp:110] Querying GPUs
caffe-time-test.txt
nvidia@tegra-ubuntu:~/cviz/mtcnn$ cat run
GLOG_minloglevel=1 ./demo-gpu.py
nvidia@tegra-ubuntu:~/cviz/mtcnn$ ./run
W1012 16:25:42.740932 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
W1012 16:25:44.495682 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
W1012 16:25:44.516561 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
test1.jpg
[1]: 101
[2]: 101
[4]: 101
[4.5]: 101
[5]: 25
[6]: 16
[7]: 16
[8]: 16
2: (16, 5)
[9]: 13
[10]: 13
[11]: 2
3: (2, 5)
test2.jpg
F1012 16:25:46.933244 864 cudnn_conv_layer.cu:51] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM, device 0
*** Check failure stack trace: ***
@ 0x7f81ae4718 google::LogMessage::Fail()
@ 0x7f81ae6614 google::LogMessage::SendToLog()
@ 0x7f81ae4290 google::LogMessage::Flush()
@ 0x7f81ae6eb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f826068bc caffe::CuDNNConvolutionLayer<>::Forward_gpu()
@ 0x7f82be8f74 caffe::Layer<>::Forward()
@ 0x7f82518eac caffe::Net::ForwardFromTo()
@ 0x7f82bdd940 boost::python::objects::caller_py_function_impl<>::operator()()
@ 0x7f81b45c04 boost::python::objects::function::call()
@ 0x7f81b45e28 (unknown)
@ 0x7f81b4d5a8 boost::python::handle_exception_impl()
@ 0x7f81b42a34 (unknown)
@ 0x458ee8 PyObject_Call
./run: line 1: 864 Aborted GLOG_minloglevel=1 ./demo-gpu.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.