patrikhuber / superviseddescent Goto Github PK
View Code? Open in Web Editor NEWC++11 implementation of the supervised descent optimisation method
Home Page: http://patrikhuber.github.io/superviseddescent/
License: Apache License 2.0
C++11 implementation of the supervised descent optimisation method
Home Page: http://patrikhuber.github.io/superviseddescent/
License: Apache License 2.0
Hi Patrik,
I'm trying to run and test your project under Codeblocks. Is that possible?
I've ran examples .cpp files in terminal but don't know how to run whole project in Codeblocks.
Can you help me on that?
Hi Patrick,
Awesome project, thanks for sharing it. I cloned the repository, and compiled it with cmake (with all the required libraries already installed). The compilation went fine, but when I try to run it, I get the following error:
Error reading the RCR model "data/rcr/face_landmarks_model_rcr_22.bin": Failed to read 8 bytes from input stream! Read 0
I am using Ubuntu 14.04. Also, I am testing it on OpenCV 3.0, but that does not seem to be problem here, as far as I can see.
Thanks,
Avi
I'd like to support the popular me17 landmark set that's used in a lot of papers (e.g. Zhou et al. and ours). However, some landmarks are different from the ibug-68 set and above papers also use the original LFPW set (or a subset of it), and not the ibug release of LFPW.
We should thus support training in the original LFPW as well.
vl-hog has really good performance and it's hard to replace. But I don't like to include C headers and I would really like to replace it with a header-only, modern C++ solution. It's just hard to find one, and rewriting vl-hog would be a lot of work.
I managed to include vl-hog as "header-only" by #include
'ing vl-hog's .c
file, and I had to make some casts in its code explicit in the process to get it to compile. I hope what I did is not problematic.
Hi Patrick,
I'm trying to estimate different algorithm's tracking error. For that, I'm using two own databases; one with real faces (maybe you known it) and other with synthetic faces (generated by the 3D Basel Face Model) who mimic the movement of the real one (this one is not published).
Using the Supervised Descent Model on the synthetic database, I have a tracking error in 3 videos with high roll angles (I attach the videos with the landmarks obtained by SDM here). I think it's interesting to stand out that SDM's tracking error without these problematic videos is similar to IntraFace's tracking error.
I think this tracking error may be due to the algorithm training. I have thought about doing a new training with videos with high roll angles. I also think it would be interesting detecting face landmarks using previous frame's landmarks.
What do you think about training with videos?
If we train with synthetic faces, we have true knowledge about landmarks' position every frame. But BFM's synthetic faces don't modify their expression and don't close their eyes. So, maybe training with a mix of synthetic faces generated by your 3DMM (using different expressions) and real videos?
Best regards,
Andoni.
Hi
I ran your code on VS.
I used 28 images(with 5 landmarks) for training. during test, if the test images are included in training images, the accuracy is very good. otherwise, the accuracy is very bad.
I then used 3283 images to train the algorithms. The accuracy for some images are good, but some are not good.
I am wondering there is b0/bk mentioned in the author's paper, but I cannot find the implementation of b0/bk in you code.
Did you ever test your algorithms with more images? how about the performance?
Could you please help to tell me about b0/bk question?
The original SDM demo was released in Intraface project (Matlab and C++ versions)
Comparing to Intraface, your implementation is not as robust and stable. Of course the reasons may be (1) descriptor type (Intraface use modified SIFT), (2) Training database (Intraface use movies for tracking) , (3) Implementation details.
Have you compared your SDM implementation with Intraface?
At the moment, it prints Failed to read 8 bytes from input stream! Read 0
, which is not that intuitive.
Hi huber,
Thanks for reply!I have viewed your updating codes. Actually I have solved my problems by rewriting codes in many places, please forgive my behaviors. Besides, I also encountered new problems: I use helen database with 2000 labeled pictures on iBug website, and scale 15 times and rotate 5 times , totally about 30000 training images. I train it on 64G rams server. Unfortunately it consumes all rams, that is , it needs very large memory in training step. Can you optimize this problem? please forgive me my pool english, happy everyday!
Hi Patrik,
I builded the code and generated executable files for landmarkdetection, poseestimation and simplefunction
Can you please tell me the usage of poseestimation.cpp and simplefunction.cpp and how to execute the files
Landmark Detection:
I gave the inputs and got the output with some points drawn on the image.
Pose Estimation:
when i run these, getting residual points alone
For, Simple function no idea about that.
Hi
I have a question in regards to rcr-train, can I run this code for my own data I mean for different images and feature points?
Also, how do you calculate the mean? Is it the mean for the feature points of the images of the training set?
Regards
Hello patrihhuber,
I have a question about your code.
firstly,Mat landmarks = (cv::Mat_(1, 20) << 498.0f, 504.0f, 479.0f, 498.0f, 529.0f, 553.0f, 489.0f, 503.0f, 527.0f, 503.0f, 502.0f, 513.0f, 457.0f, 465.0f, 471.0f, 471.0f, 522.0f, 522.0f, 530.0f, 536.0f);
These landmarks How to get?
I use other way get landmarks but can not get the right pose_estimation。
secondly, Mat facemodel; // The point numbers are from the iBug landmarking scheme
Will this time dedicated to other image file of the test need to change it?
Thanks!
Hi
Can I use superviseddescent to save facial landmark points from a video file ?
Hi,
Patrik, thank you for your code of SDM, I have a problem when I use the landmark detection.cpp, I have used about 3000 images in the IBug for training in landmark detetion.cpp, but when I track the face using the trained model, il looks like the mean shape and when I close the eye, the landmark didn't follow, it keeps always the same form, I don't understand the reason.
Waiting for your answer.
Yours sincerely
Anto
Hi Patrikhuber,
I would like to run only landmark detection on your SDM code, and I would like to make the prediction code run faster as long as the landmark accuracy is good for me.
as you might know that, the most time needed for SDM landmark is for HOG calculation, and the time is mostly related with the size of ROI image calculated in adaptive_vlhog.hpp file.
my questions are:
in your rcr-train.cpp file, you set hog_params as follows:
std::vectorrcr::HoGParam hog_params{
{ VlHogVariant::VlHogVariantUoctti, 5, 11, 4, 1.0f },
{ VlHogVariant::VlHogVariantUoctti, 5, 10, 4, 0.7f },
{ VlHogVariant::VlHogVariantUoctti, 5, 8, 4, 0.4f },
{ VlHogVariant::VlHogVariantUoctti, 5, 6, 4, 0.25f } };
why is num_cells 5? shouldn't it be an even number?
in order to keep the accuracy and promote the speed, what are the best hog_params according to your experience?
in rcr-train.cpp file, the regressor level is 4. should it be smaller? what is the best regressor level according to your experience?
Hi Patrik,
I have run a landmark detection with your pre-trained model "face_landmarks_model_rcr_68.bin" to a Still Face video (done synthetically) and I have seen that in each frame, the Landmarks' position is different. I want to minimize the diference between frames.
I think detect landmarks 'x' times and average each frame should minimize jitter.
So, what do yo think about this solution? Do you think there is a better way to minimize it?
Is it possible to train a model with less jitter?
Regards,
Andoni.
Hi Patrik!
Is there any way of measuring confidence in landmark detection? No ground truth data.
A naive approach I see so far is using facedetect rectangle area ratio to area of a bounding box around detected landmarks, but, are there better ways?
Hello patrihhuber,
I have a question about your code. I am pretty new to this field and I am quite amazed with your implementation of SDM,
I have read articles about SDM and I am wondering how you generated "mean_ibug_lfpw_68.txt" file. I have read question in solved issues How to calculate the mean-face ? but I didn't understand the answer quite well.
Can you explain me in more details or give me some pseudo code or the code which generated that file.
Can you tell me what are the steps, for example, if I have 100 pictures and 100 files which have landmark positions, what I must do to generate "mean_ibug_lfpw_68.txt". In article that I have read this step is mentioned but I haven't found explanation how to do it, so if you could provide me pseudo code on how to generate "mean_ibug_lfpw_68.txt" or the code which
you have used i would be very thankful :)
Dear Patrik
I want to use facial landmark detection in web based application. How can I convert code to client side web based code. face_landmarks_model_rcr_68.bin is about 85M and it is heavy to load in web applications. How can I use it?
Regards.
Morteza
Honestly, I'm not deep at research papers but it seems to me that the "Fitting 3D Morphable Models using Local Features" describes approach where landmarks are detected through 3DMM fitting: "In essence, the proposed method can also be seen as a 3D model based landmark detection,
steering the 3D model parameters to converge to the ground truth location."
At the same time robust cascaded regression landmark detection is described in another paper: "Random Cascaded-Regression Copse for Robust Facial Landmark Detection".
I ask to to move a implementation of bodies non template functions from hpp-files to cpp-files.
Now at assembly of your code there are errors, for example: multiple definition of `rcr::align_mean(cv::Mat, cv::Rect_, float, float, float, float)'
Hi,
I successfully adapted pose_estimation for one of our 3D models.
Now I would like to be able to train on more than one model but I am having trouble modifying the code.
How should I proceed to adapt the ModelProjection variable for training and testing?
Cheers
Hi Patrick,
I build your code successfully and tested the following codes:
Hello Patrik,
first I want to thank you for making available such a great piece of software.
I did a few tests on multiple images and the position of the found landmarks is usually quite good. I have however encountered a problem when using the OpenCV face detector.
Sometimes, the OpenCV V&J face detector detects things that aren't actually faces. The supervised descent code still runs and things that aren't landmarks on a face are detected.
My question is :
By looking at the way the optimization goes after each steps or some other method, is there a way to tell that the function is not being optimized properly? In the case of a face, that could be used to decide whether the landmarks were found successfully of not. What I mean is when doing the supervised descent, can we look at the update_step or some other method and tell if things are going the right way or not? The problem with faces is as you mentioned that we have no ground truth to compare our results with because every face is different. Is there a way to tell that we are close or very far from the best solution?
Sorry, I am just an engineer and not a computer vision scientist but I would really appreciate if you could give me some useful pointers.
Regards,
Guillaume
Hi Patrikhuber,
I would like to run pose estimation in your code. should I run the project under examples/pose_estimation or some other one?
Thanks!
Hi Patrikhuber,
when we run the SDM test(predict) code, the CPU time needed for each image is so much. For different size of images, the time needed is much different.
My qestions are:
Many thanks for your answer!
Hi,
Do this approach allow for real-time landmark tracking in videos? Does this program has one function for facial landmark detection and tracking ? I want to track the landmark .What should I do ?Do I need to write the tracking function myself?
Hi Patrik,
I want to ask about rcr-train and rcr-detect files.
Algorithm calculates inter eye distance, understand that it's needed to better calculate eye landmarks.
But if it's really required to use IEM to train model in rcr-train? For example, I only want to detect nose or mouth to get better speed but don't know if I can easily rebuild rcr-train without using IEM functions.
To sum up, I want to ask:
If inter eye distance is needed ONLY to calculate eye landmarks or whole algorithm depends on that(in some way)?
Thank you in advance for your time.
I think it would be better to separate the optimisation from the actual learned model (the regressors).
Reason: What's not so nice right now is that we store the SupervisedDescentOptimiser
to the disk, with all its type information about the solver (e.g. LinearRegressor<PartialPivLUSolver>
), and we store the regularisation as well.
Both the solver and the regularisation are only relevant at training time, and there's neither need to store them nor should we need to know the type of the solver when we load the model.
Also, it would mean a user that just wants to use the landmark detection (and not train a model) wouldn't need Eigen, because Eigen is only needed in LinearRegressor::learn()
and not in predict()
.
Regarding the regulariser, we could just choose to exclude it from the serialisation, but I don't think it's very intuitive to only serialise half of a class. I think there must be a better solution that solves the other shortcomings as well. Maybe we can even just make SupervisedDescentOptimiser::train()
a free function and get rid of the class.
A related project, tiny-cnn, doesn't separate the model from the optimisation, but I kind of feel like we should.
Hi patrikhuber,
I noticed that you used C++11 lib cereal to serialize the trained model and save it to binary file.
you use the following code to save the model:
void save_detection_model(detection_model model, std::string filename)
{
std::ofstream file(filename, std::ios::binary);
cereal::BinaryOutputArchive output_archive(file);
output_archive(model);
};
My question is:
do you have any other substitute solution that doesn't use C++11 syntax about how to save/load the model to/from files?
can you share non-C++11 code about how to save/load the model?
Hi Patrik,
I want to ask data/rcr/face_landmarks_model_rcr_22.bin.
The algorithm uses only 22 key points to compute the regression and predicts 68 key points. I have tried to train the 68 feature points directly, and the result is not very good. Can you tell me why? Is it because of the regress's regularization parameter lamda?
Thanks a lot.
superviseddescent-master\include\superviseddescent/utils/ThreadPool.h(49): error C2332: “class”: 缺少标记名
superviseddescent-master\include\superviseddescent/utils/ThreadPool.h(49): error C2011: “”:“enum”类型重定义
1> D:\Microsoft Visual Studio 11.0\VC\include\thr/xthreads.h(41) : 参见“”的声明
superviseddescent-master\include\superviseddescent/utils/ThreadPool.h(49): error C2143: 语法错误 : 缺少“,”(在“...”的前面)
Hi Patrik,
It seems that include/verbose_solver.hpp file is missing. I am not sure whether this file is needed in the current version, but basically I cannot compile your project as it is.
Many thanks,
Doxygen has a @tparam command to document template parameters (https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdtparam). We should probably use it in the library documentation.
Hi, Patrik!
I'm fascinated by your work!
Did you see confidence score in Intraface face alignment? Is it possible to implement it here?
I'll appreciate your answer.
Thanks.
I have visual studio RC2015 on windows 10. I have downloaded Eigen 3.3, opencv3.2 (Since the 2.4.3 does not support visual studio RC2015), I downloaded boost 1_63 and the supervised descent project. I included the necessary directories in visual studio.
When trying to build rcr-detect.cpp I got the following error:
C4244, 'initializing': conversion from 'double' to 'int', possible loss of data.
Finally, what are the proper versions for Eigen, opencv, boost?
I transplanted your program to the phone and found that facial landmark detection of the speed is very slow.On a 640*480 image, the processing time is about 450ms( not include the time to find the face).How can I speed up ? Where is the problem ?
Hi Patrik,
I've tried to run rcr-train to prepare a model for tests.
Unfortunately, it's not working. Prints the residual but also got this msg:
boost::filesystem::directory_iterator::construct: No such file or directory: "home/user/superviseddescent/examples/data/ibug_lfpw_trainset/"
Given arguments are below:
--data "/home/user/superviseddescent/examples/data/ibug_lfpw_trainset/"
--mean "/home/user/superviseddescent/examples/data/mean_ibug_lfpw_68.txt"
--facedetector "/home/user/opencv_files/haarcascade_frontalface_alt.xml"
--config "/home/user/superviseddescent/apps/rcr/data/rcr_training_22.cfg"
--evaluation "/home/user/superviseddescent/apps/rcr/data/rcr_eval.cfg"
--output "/home/user/superviseddescent/apps/rcr/trained_output_model.bin"
--test-data "home/user/superviseddescent/examples/data/ibug_lfpw_trainset/"
if I choose one pic (as below), the result is the same
--test-data "home/user/superviseddescent/examples/data/ibug_lfpw_trainset/image_0002.png"
Do you know where I can find the source of this issue?
Thanks in advance for your time!
hi patrikhuber
thank you for your code. I'm following your code and work, could you mind send me a copy of Visual Studio 2013 of this project?
Hi patrikhuber,could you please release the code of calculating the "mean_ibug_lfpw_68.txt"?The result is exciting! I'm just a beginner in this area.
Hello Patrik,
I am trying to use pose_estimation.cpp example code to detect yaw, pitch and roll for an input image.
What is the size of the image from which you have taken the landmarks (Mat landmarks) ? Can I take image of any size and identify the landmark points using opencv/dlib and use it for pose estimation ?
What do you mean by image origin (500) ? How do we get approximate focal length (1800) ?
Any help is really appreciated. Thanks.
examples/landmark_detection
is a first-run example and it should run fairly quick to give the user some results. I should change it to only train 5 or 10 landmarks, so the training will be quicker than with 68 landmarks.
Hi Prof.,
I run your code to train the model.
however, when I run the funtion solve in VerbosePartialPivLUSolver class, the following code runs very slow:
RowMajorMatrixXf AtA_Eigen = A_Eigen.transpose() * A_Eigen;
A_Eigen matxi is 20*8801.
it means that to do the transpose of matrix it needs 76588ms which is a very big value.
my system configurations are:
windowns10, CPU Intel i5-6200U, RAM 8G.
the QR of matrix cannot be finished at all in the solve function.
Do you have any idea about this issue?
Hi Patrik,
I have been testing your code for some time now for facial landmark training and tracking.
I modified rcr-track to use the landmark points from the previous frame as initialization, instead of the mean. Face detection is performed in the first frame. After that, I use the landmark points from the previous frame. But on doing this, the landmarks get scattered after just a few frames. This happens even if the face remains more or less stationary. I am pasting the code snippet that I used below. Am I doing something wrong?
(current_landmarks holds the landmarks form the previous frame, as also mentioned in your comments in the code)
if(!have_face) { ...
...
have_face=true;
}
else{
cv::Mat prev_landmarks = to_row(current_landmarks);
current_landmarks = rcr_model.detect(image, prev_landmarks);
rcr::draw_landmarks(image, current_landmarks);
have_face=true; // simply setting it to true for now for testing purposes.
}
I would greatly appreciate any help or insight from you.
Cheers!
Dear all,
I build the code in this github, and find the cmake build error both on linux and Windows. Could help me to fix it ? Thanks very much!
PS: I used to build the previous one successfully。
Linux Error:
Centos +Cmake 3.3.0
Windows Error:
Windows 8.1+ VS2013+Cmake3.3.0
Cmake Error
That's very grateful to help me get though this bug.
Thanks very Much!
I am referring to this issue about speed up the process #31
One point is about Hog feature extraction can be parallel. I make a simple time measurement of that function, which took about 60ms for 4 regressions. I am thinking to improve it using GCD on iOS
__block std::vector<int> hogDescriptorsIdx;
dispatch_apply(LandmarkIndexs.size(), c_queue, ^(size_t k) {
int i = (int)k;
....
hogDescriptor = hogDescriptor.t(); // now a row-vector
dispatch_sync(s_queue, ^{
hogDescriptors.push_back(hogDescriptor);
hogDescriptorsIdx.push_back(i);
});
});
cv::Mat sortedHog;
for( int i = 0; i < hogDescriptorsIdx.size(); i++) {
sortedHog.push_back(hogDescriptors.row(hogDescriptorsIdx.at(i)));
}
hogDescriptors = sortedHog.reshape(0, sortedHog.cols * sortedHog.rows).t();
I only parallelise the outer loop as I think it is thread safe within its loop. However, the result is wrong and I don't know where the problem is. May I have any advise?
Hi patrikhuber,
I build the tracking-without-fd, it usually has good tracking result. But sometimes, the tracking rectangle become small, and tracking result is wrong. I also test Intraface http://www.humansensing.cs.cmu.edu/intraface/download_functions_cpp.html, its tracking results more robust. How can program find when superviseddescent has a wrong tracking? thanks a lot!
@patrikhuber , hi , I want to know that how do you get 3-d face coordinates of "Mat facemodel;"(in pose_estimation.cpp), and three angles(pitch,yaw,roll) of "Mat predicted_params" are relative to given 3-d face, is it right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.