GithubHelp home page GithubHelp logo

mtcnn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtcnn's Issues

ROC on FDDB question

could you offer the ROC on FDDB ? The ROC that i use you project to plot is shown:

image

why?

thresholds

excuse me , what are the thresholds of the models when operating ''face detection''?
(I mean the thresholds that generate the curves of FDDB and WIDERface in the author's papaer)
thanks!

why an data blob be reshapeed to 1,3,ws,hs

你好。首先很感谢你把MTCNN python化。但我在阅读你的代码的时候,我发现有句代码是PNet.blobs['data'].reshape(1,3,ws,hs); 正常来讲,不应该是reshape(1,3, hs, ws),请问这是什么原因导致??

Advice for face alignment

Hi @DuinoDu ,
I need a Python version of the following matlab routine, which is provided by Yandong Wen to perform a 2-D face alignment using the 5 facial points obtained by MTCNN:

% load face image, and align to 112 X 96
imgSize = [112, 96];
coord5points = [30.2946, 65.5318, 48.0252, 33.5493, 62.7299; ...
                51.6963, 51.5014, 71.7366, 92.3655, 92.2041];

image = imread('path_to_image/Jennifer_Aniston_0016.jpg');
facial5points = [105.8306, 147.9323, 121.3533, 106.1169, 144.3622; ...
                 109.8005, 112.5533, 139.1172, 155.6359, 156.3451];

Tfm =  cp2tform(facial5points', coord5points', 'similarity');
cropImg = imtransform(image, Tfm, 'XData', [1 imgSize(2)],...
                                  'YData', [1 imgSize(1)], 'Size', imgSize);

I have already tried some code with OpenCV and skimage, using the points obtained by your code; but no success:

import numpy as np
import cv2

imgSize = (112, 96)

# given a dimension size, calculate the center coordinate
calc_center_coord = lambda x: np.float32(x-1)/2 if x % 2 == 0 else np.float32((x-1)/2)

# calculate normalized coordinates
calc_norm_coord = lambda x,center,scale: (x-center)/scale

x_ = [30.2946, 65.5318, 48.0252, 33.5493, 62.7299]
y_ = [51.6963, 51.5014, 71.7366, 92.3655, 92.2041]
xc, yc = calc_center_coord(imgSize[1]), calc_center_coord(imgSize[0])
x_norm = [calc_norm_coord(x, xc, imgSize[1]) for x in x_]
y_norm = [calc_norm_coord(y, yc, imgSize[0]) for y in y_]

src = np.array( zip(x_norm,y_norm) ).astype(np.float32).reshape(1,5,2)
 
w, h = img.shape[1], img.shape[0]
img_c_x, img_c_y = calc_center_coord(img.shape[1]), calc_center_coord(img.shape[0])

# there might be more than one faces, hence
# multiple sets of points
for pset in points:
    img2 = img.copy()

    pset_x = pset[0:5]
    pset_y = pset[5:10]

    pset_norm_x = [calc_norm_coord(x,img_c_x,imgSize[1]) for x in pset_x]
    pset_norm_y = [calc_norm_coord(y,img_c_y,imgSize[0]) for y in pset_y]

    dst = np.array( zip(pset_norm_x,pset_norm_y) ).astype(np.float32).reshape(1,5,2)
    
    transmat = cv2.estimateRigidTransform( src, dst, False )

    out = cv2.warpAffine(img2, transmat, (w, h))

    cv2.imshow("result", out)
    cv2.waitKey(0)

output looks totally wrong:
input2output

Why the results from python and matlab are different?

I have used both version of python and matlab. But unfortunately, I found that the results of these are not same. I even found that some image can be detected using matlab but not python. Could you please help me solve this problem? Thank you very much.

Gtk-CRITICAL **: gtk_widget_new

Hi. I am running into a Gtk-CRITICAL error (see below). The demo.py code works on my Raspberry Pi 3 (Raspbian Jessie), but is failing on my Nvidia TX1. Could this be a GTK or OPENCV (3.3) version issue? Any ideas???

nvidia@tegra-ubuntu:~/cviz/mtcnn$ uname -a
Linux tegra-ubuntu 4.4.38-jetsonbot-v0.2 #4 SMP PREEMPT Thu Sep 21 17:08:48 EDT 2017 aarch64 aarch64 aarch64 GNU/Linux

nvidia@tegra-ubuntu:~/cviz/mtcnn$ ./run

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0928 15:17:58.194954 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:17:58.194993 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:17:58.195003 4438 _caffe.cpp:142] Net('./model/det1.prototxt', 1, weights='./model/det1.caffemodel')
W0928 15:17:58.199648 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
W0928 15:18:07.621656 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:18:07.621706 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:18:07.621713 4438 _caffe.cpp:142] Net('./model/det2.prototxt', 1, weights='./model/det2.caffemodel')
W0928 15:18:07.626457 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
W0928 15:18:07.734647 4438 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0928 15:18:07.734678 4438 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0928 15:18:07.734688 4438 _caffe.cpp:142] Net('./model/det3.prototxt', 1, weights='./model/det3.caffemodel')
W0928 15:18:07.735986 4438 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.

audrey2.png
[1]: 26
[2]: 26
[4]: 26
[4.5]: 26
[5]: 2
[6]: 1
[7]: 1
[8]: 1
2: (1, 5)
[9]: 1
[10]: 1
[11]: 1
3: (1, 5)
./demo.py:566: Warning: specified class size for type 'CvImageWidget' is smaller than the parent type's 'GtkWidget' class size
cv2.imshow('img', img)

(demo.py:4438): Gtk-CRITICAL **: gtk_widget_new: assertion 'g_type_is_a (type, GTK_TYPE_WIDGET)' failed
./run: line 1: 4438 Segmentation fault GLOG_minloglevel=1 ./demo.py


nnvidia@tegra-ubuntu:~/cviz/mtcnn$ dpkg -l libgtk2.0-0 libgtk-3-0

||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
ii libgtk-3-0:arm64 3.18.9-1ubuntu3 arm64 GTK+ graphical user interface library
ii libgtk2.0-0:arm64 2.24.30-1ubuntu1.16.04.2 arm64 GTK+ graphical user interface library

Reqeust to add a licence.

I'm interested in this project. Would you consider adding a licence or some description of the terms of use?

Understanding cascading of sizes in mtcnn

Hi,
Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.

The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the following line in the code:

https://github.com/DuinoDu/mtcnn/blob/master/demo.py#L268

For reference I have printed out the original size and the rescaled size:
ORIGINAL Height: 340
ORIGINAL Width: 151
SCALE USED (were computed before): 0.107493555074
RESCALED Height: 37
RESCALED Width: 17

The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.

# Code file: det1.prototxt 
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12

What I don't understand is where is the size going from size of image to 12x12?

demo.py中的代码错误

demo.py中第447行,如下
w = total_boxes[:,3] - total_boxes[:,1] + 1
h = total_boxes[:,2] - total_boxes[:,0] + 1
w、h计算错了,需对调,虽然结果一般只差1个像素。

原作者matlab代码是
w=total_boxes(:,3)-total_boxes(:,1)+1;
h=total_boxes(:,4)-total_boxes(:,2)+1;
对应python代码应该是
w = total_boxes[:,2] - total_boxes[:,0] + 1
h = total_boxes[:,3] - total_boxes[:,1] + 1

Missing Error.txt file

I am trying to run MTCNN using python. When I run the code, it gives me an error related to missing error.txt file. From where will I get this file?

Add a few codes to show the face landmarks

In this python-caffe veision of MTCNN, the demo does not draw face landmarks upon the face box. However, this can be done by simply adding a few lines of codes in "demo.py". Here is my fix
demo_py
Thus it'll show the image with annotated face landmarks .
dttest1

关于绘制PR曲线

你好!我想问下作者提供的pr图是怎么作出来的,我用o-net输出的bbox和score 拿到widerface 的eval-tools上运行,效果非常差……所以应该怎样选取bbox和score

Why the size of input picture is not 12X12 in P-Net

I have seen your code in demo.py. I notice that you don't resize the picture to 12X12 in P-Net. And I debug your code ,and I found the picture that is not 12X12X3. so the bounding box is not 1X1X4 in P-Net.It don't match the paper

Bounding box has 5 components

Hi @DuinoDu ,
in some cases, face bounding box has 5 components, whereas it should be 4.
For instance,
aaron_peirsol_0001
result : [ 81.67765533 68.74128327 167.86444914 183.59577323 0.99977213]

andre_agassi_0007
result: [ 63.16729596 66.6135435 155.45199138 191.26408356 0.99999678]

what do you think about this?

about speed

hello, i have tried demo.py with my testing pic.
the speed is about 200ms/pic (1200*800), is this normal?
and what can i do to speed it up so that i can detect faces in realtime?
thanks

demo.py script failing in GPU mode

The demo.py script completes in CPU mode, but I am running into a CUDNN error when the script is running in GPU mode. The first image is processed successfully, but when I close the first image so the second image can be processed, the CUDNN error is generated. Could this be a config issue?

caffe.set_mode_gpu();
caffe.set_device(0)

Thanks.


nvidia@tegra-ubuntu:~/caffe/tools$ caffe device_query
I1012 16:42:30.237367 1105 caffe.cpp:470] This is NVCaffe 0.16.4 started at Thu Oct 12 16:42:29 2017
I1012 16:42:30.243551 1105 caffe.cpp:473] CuDNN version: 6021
I1012 16:42:30.243568 1105 caffe.cpp:474] CuBLAS version: 8000
I1012 16:42:30.243578 1105 caffe.cpp:475] CUDA version: 8000
I1012 16:42:30.243589 1105 caffe.cpp:476] CUDA driver version: 8000
I1012 16:42:30.243602 1105 caffe.cpp:110] Querying GPUs
caffe-time-test.txt

nvidia@tegra-ubuntu:~/cviz/mtcnn$ cat run
GLOG_minloglevel=1 ./demo-gpu.py

nvidia@tegra-ubuntu:~/cviz/mtcnn$ ./run
W1012 16:25:42.740932 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
W1012 16:25:44.495682 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
W1012 16:25:44.516561 864 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.

test1.jpg
[1]: 101
[2]: 101
[4]: 101
[4.5]: 101
[5]: 25
[6]: 16
[7]: 16
[8]: 16
2: (16, 5)
[9]: 13
[10]: 13
[11]: 2
3: (2, 5)

test2.jpg
F1012 16:25:46.933244 864 cudnn_conv_layer.cu:51] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM, device 0
*** Check failure stack trace: ***
@ 0x7f81ae4718 google::LogMessage::Fail()
@ 0x7f81ae6614 google::LogMessage::SendToLog()
@ 0x7f81ae4290 google::LogMessage::Flush()
@ 0x7f81ae6eb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f826068bc caffe::CuDNNConvolutionLayer<>::Forward_gpu()
@ 0x7f82be8f74 caffe::Layer<>::Forward()
@ 0x7f82518eac caffe::Net::ForwardFromTo()
@ 0x7f82bdd940 boost::python::objects::caller_py_function_impl<>::operator()()
@ 0x7f81b45c04 boost::python::objects::function::call()
@ 0x7f81b45e28 (unknown)
@ 0x7f81b4d5a8 boost::python::handle_exception_impl()
@ 0x7f81b42a34 (unknown)
@ 0x458ee8 PyObject_Call
./run: line 1: 864 Aborted GLOG_minloglevel=1 ./demo-gpu.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.