Comments (9)
Yes, I think this is a known problem which has been discussed for example here. To my knowledge it has not been really solved though.
Have you tried this approach? Does it produce better results?
Another approach would be to not do any transformation at all but instead just use a bounding box and let the CNN handle any rotations etc within that box. Rotations can then be seen as a kind of data augumentation instead. I have tried this approach but using MTCNN for face alignment, and when training a Inception-Resnet-v1 network on this data I can get a model with accuracy ~0.975 on LFW. Not sure how much of the performance improvement that can be attributed to not having the shearing effect though.
from facenet.
I am not very sure that it can produce better results.
But I use the code :
assert imgDim is not None
assert rgbImg is not None
assert landmarkIndices is not None
if bb is None:
bb = self.getLargestFaceBoundingBox(rgbImg, skipMulti)
if bb is None:
return
if landmarks is None:
landmarks = self.findLandmarks(rgbImg, bb)
npLandmarks = np.float32(landmarks)
tplLandmarks = imgDim * MINMAX_TEMPLATE*scale + imgDim*(1-scale)/2
tplLandmarks = np.transpose(tplLandmarks)
npLandmarks = np.vstack( (np.transpose(npLandmarks), np.ones(tplLandmarks.shape[1])) )
#npLandmarkIndices = np.array(landmarkIndices)
#pylint: disable=maybe-no-member
#H = cv2.getAffineTransform(npLandmarks[npLandmarkIndices],
# imgDim * MINMAX_TEMPLATE[npLandmarkIndices]*scale + imgDim*(1-scale)/2)
H = np.matmul(np.matmul(tplLandmarks, np.transpose(npLandmarks)),
np.linalg.inv(np.matmul(npLandmarks,np.transpose(npLandmarks))))
thumbnail = cv2.warpAffine(rgbImg, H, (imgDim, imgDim))
return thumbnail
this method can get a global transformation matrix and I did not get a better result yet.
from facenet.
@davidsandberg Can you open the code of Inception-Resnet-v1 and the model with accuracy ~0.975 on LFW? ~Thank you very much!
from facenet.
All the code for training is already in the repo. I ran the command
python facenet_train_classifier.py --logs_base_dir /media/david/BigDrive/DeepLearning/logs/facenet/ --models_base_dir /media/david/BigDrive/DeepLearning/models/facenet/ --data_dir ~/datasets/facescrub/facescrub_mtcnnalign_182_160:~/datasets/casia/casia_maxpy_mtcnnalign_182_160 --image_size 160 --model_def models.inception_resnet_v1 --lfw_dir ~/datasets/lfw/lfw_mtcnnalign_160 --weight_decay 2e-4 --optimizer RMSPROP --learning_rate -1 --max_nrof_epochs 80 --keep_probability 0.8 --random_crop --random_flip --learning_rate_schedule_file ../data/learning_rate_schedule_classifier_long.txt
With the learning rate schedule (../data/learning_rate_schedule_classifier_long.txt)
# Learning rate schedule
# Maps an epoch number to a learning rate
0: 0.1
65: 0.01
77: 0.001
1000: 0.0001
The Inception-Resnet-v1 will improve the performance significantly compared to the nn4 model but to get to 0.975 a better alignment (MTCNN) is also needed. The code that I used to get the above result can be found here but it requires Caffe installed and to clone the MTCNN repo, which i don't plan to describe here. Instead I'm working on an implementation of this using python/tensorflow but this in not ready yet.
from facenet.
I was studying Dlib recently and found that Dlib already provides the ability to detect faces, extract landmarks, and with one additional step, save the "face_chips" (aligned using the landmark) as image files.
I tried this approach to process the LFW dataset and it seems to be working pretty well.
The only caveat is that, the Dlib face detector is not able to detect all the faces in the LFW dataset. For these images, images was cropped directly without alignment using landmark features.
Since "align_dlib.py" is already using Dlib and it seems that the easiest way for this task is to use native Dlib all the way.
There is an example in the Dlib source tree:
http://dlib.net/face_landmark_detection_ex.cpp.html
The following is an example on how to save the aligned face_chips to files.
for (unsigned long j = 0; j < face_chips.size(); ++j) {
std::ostringstream stringStream;
stringStream << "build/tmp/fl_chip_" << j << ".jpg";
std::string filename = stringStream.str();
save_jpeg(face_chips[j], filename, 100);
}
Thanks,
--Scott
from facenet.
@davidsandberg
I see that you upload the code using MTCNN, great!
I have some question about it:
- Why you do not use detected points to 2D transform the faces, are there any reason?
- What about the speed of your implementation on TF? I have measured it as about ~25 FPS on VGA image (on the second run of detection, because fist detection is very slow). The author claim to have ~100 FPS using MatLab implementation (do you achive such speed using MatLab?).
Hint: I have replace your "imResample" by "scipy.misc.imresize" and get >50% faster evaluation.
from facenet.
@melgor
Happy to see that you are looking at the MTCNN implementation.
The implementation still contains a couple of bugs which I hope to have solved after the coming weekend, but until then you should use it with care.
- The main reason is that I'm lazy ;-) and the quickest/easiest way was to just use the bounding boxes. But also, since the MTCNN is better at detecting profile faces I figured that this could cause more severe distortions to the images. But this is an interesting point for investigation that I haven't done yet.
- I haven't looked at the speed yet. When the debugging of the code is done it would be interesting to check. The impression I got (totally unscientifically) is that it was approximately the same speed as the matlab implementation. But I will have to get back to you on this one...
For the resize thing I guess it looks quite crazy :-), but the reason for using the home-brewed implementation was that while comparing the tensorflow implementation to the matlab one I needed a resample that worked identically in the two implementations. So I ended up having that same code in matlab as well, which should be exchanged for a scipy or opencv implementation when the two implementations match.
from facenet.
I tried with the following command:
python facenet_train_classifier.py --logs_base_dir logs/facenet/ --models_base_dir models/facenet/ --data_dir ./align/casia --image_size 182 --model_def models.inception_resnet_v1 --lfw_dir ./align/datasets/lfw_160 --weight_decay 2e-4 --optimizer RMSPROP --learning_rate -1 --max_nrof_epochs 80 --keep_probability 0.8 --random_crop --random_flip --learning_rate_schedule_file ../data/learning_rate_schedule_classifier_long.txt
It has the following error:
Traceback (most recent call last):
File "facenet_train_classifier.py", line 309, in
main(parse_arguments(sys.argv[1:]))
File "facenet_train_classifier.py", line 100, in main
phase_train=phase_train_placeholder, weight_decay=args.weight_decay)
File "/mnt/2TB/src/facenet/src/models/inception_resnet_v1.py", line 145, in inference
dropout_keep_prob=keep_probability, reuse=reuse)
File "/mnt/2TB/src/facenet/src/models/inception_resnet_v1.py", line 167, in inception_resnet_v1
with tf.variable_scope(scope, 'InceptionResnetV1', [inputs], reuse=reuse):
File "/usr/lib/python2.7/contextlib.py", line 84, in helper
return GeneratorContextManager(func(_args, *_kwds))
TypeError: variable_scope() got multiple values for keyword argument 'reuse'
Do you know what went wrong with my command?
My tensorflow version is 10.0.
Thanks
from facenet.
I'm pretty sure I ran into the same problem but now I'm not sure how it was solved. Could have been that it was fixed when upgrading to a newer version of slim, but I'm not sure.
One option would be to upgrade to TF 0.11 and see if that fixes the problem. I will do that as well as soon as I'm done with some training.
If the problem persists, please file a new issue specifically for this.
from facenet.
Related Issues (20)
- command not found error (while trying align the LFW dataset)
- ValueError: Node 'gradients/InceptionResnetV1/Bottleneck/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512]. HOT 4
- TypeError: true_fn and false_fn arguments to tf.cond must have the same number, type, and overall structure of return values
- does facenet support docker HOT 1
- About tensorflow -v2.8.0, I have a error. HOT 1
- Validation loss
- ve
- How to make inference on a single image? HOT 1
- Bounding box is inaccurate HOT 2
- Error in Loading checkpoint file for facenet512
- Unable to open file (file signature not found) HOT 1
- CASIA Webface Dataset Link Needs Updating
- Incorrect bounding box
- Issue with Tensor Names in DeepSORT Integration with FaceNET Model
- Unable to use .pb in tensorflow's java api
- Unable to convert onnx model to TRT model
- ValueError: Node 'gradients/InceptionResnetV1/Bottleneck/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512]. HOT 1
- Request for Weight Files in faceswap-GAN Project
- How to add visualization to train_tripletloss.py
- Find
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from facenet.