GithubHelp home page GithubHelp logo

Experiments on larger datasets about sphereface HOT 48 OPEN

wy1iu avatar wy1iu commented on August 23, 2024
Experiments on larger datasets

from sphereface.

Comments (48)

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024 7

@wuqiangch I have tried A-Softmax_loss on Ms-Celeb-1M datasets.

First you should train a model with type "SINGLE" instead of "QUADRUPLE",then fine-tune this model by changing the type back to "QUADRUPLE" .Take care that fine-tune will automatically set the parameter "iteration" to 0,so you also need to change "iteration" or "lambda" by yourself.

I think that the reason why your training didn't converge is that QUADRUPLE constraint is too strong for a almost 10K classification problem.

Good Luck!

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024 6

I'm back and very happy to say that Asoftmax with a 28-layer-ResFace-Network can rich acc=99.63% and TAR=99.1%@far = 0.1% on LFW and a similar result on MegaFace with what paper says while training on MS_CELEB_1M dataset.

from sphereface.

wy1iu avatar wy1iu commented on August 23, 2024 5

Yes, we have trained A-Softmax loss on much larger dataset (MS dataset) and it definitely can work.

You should consider to modify the function of lambda or use the fine-tuning trick on pre-trained SphereFace network.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024 1

@Zhongdao fantastic!!!! The model I trained on MS_celeb_1M just get around 99.17% accuracy on lfw so I thought there might be some problems with the margin_inner_product code and then turned to the BN method...(actually the code is not exactly the same as what the paper says..)

Seem like that I need to train A-softmax on Ms dataset again,em..Can you show more details in training,like learning rate,weight_decay and iteration of the pre-train model?Thanks a lot.

By the way,I'll set up MegaFace benchmark this week and hope to get good result.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024 1

@ctgushiwei Save the similaritys of each pair and plot in Matlab

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024 1

@vzhangmeng726 @ysc703 @nyyznyyz1991
please look through this web site:
https://github.com/KaleidoZhouYN/Details-on-Face-Recognition

from sphereface.

wuqiangch avatar wuqiangch commented on August 23, 2024

@wy1iu How to train A-Softmax loss on MS-Celeb-1M datasets ? Can you show the details about training the net? I have try it and it doesn't converge,too.Thanks!

from sphereface.

Zhongdao avatar Zhongdao commented on August 23, 2024

@wy1iu @KaleidoZhouYN Thanks a lot! The pre-trained model indeed help converge and proper lambda value is important. I am still tuning those hyper-parameter to get more satisfying results. By the way, have you test your models trained with MS-1M on MegaFace?

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@Zhongdao Can you help explain how you use pre-trained model? I followed @KaleidoZhouYN and used "SINGLE" to get the pre-trained model. Then I use it for fine tuning but ended up failure. I changed "iteration" to the number of iterations during pre-training, and/or "power" to a bigger value (e.g., 100000), but neither can lead to convergence.

from sphereface.

Zhongdao avatar Zhongdao commented on August 23, 2024

@hardegg I just use the "SINGLE" method to get pre-trained model. Please note that sometimes A-softmax loss seems not to converge but actually the model is getting better. I think it might be a property of A-softmax.
In my experiment, I set lambda_min = 10 and gamma = 0.5, reaching 99.42% on lfw

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@Zhongdao Thanks for the reply. So you mean you did not use "QUADRUPLE"?

from sphereface.

Zhongdao avatar Zhongdao commented on August 23, 2024

@hardegg No, First I train a model with type "SINGLE" ,then fine-tune this model by changing the type back to "QUADRUPLE".

from sphereface.

wuqiangch avatar wuqiangch commented on August 23, 2024

@Zhongdao, the loss of your finnal model?When you train the "SINGLE" model ,what's the lambda_min and gamma ?. When you finetune using QUADRUPLE ,What‘s the lambda_min and gamma?

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@Zhongdao So I guess your finetuning did not converge eventually? Could you paste your log? I do the exactly same thing as you did (with pre-trained using SINGLE, and changed lambda_min=10, gamma=0.5), but the softmax_loss is always 87.3365 even after lots of iterations.

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@wy1iu I also tried training from scratch using ms_celeb dataset. After some failure, I tried to make it closer to original softmax in the beginning, by making "base" bigger and gamma smaller, (e.g., base=10000 and gamma = 0.01). It did converge in the beginning, but after a number of iterations (20k, lambda become around 40) the overall loss ascended and it still diverged (with softmax_loss always being 87.3365). I believe sphereface can definitely work on larger dataset. Could you give more details (log file is better)?

from sphereface.

Zhongdao avatar Zhongdao commented on August 23, 2024

@hardegg Please refer to issue #7 for details on how to pre-train with SINGLE(cosine loss).
Sometimes I also met rapid loss divergence, and I observed the same phenomenon training with center loss. It happens with a probability, so I tried many times to get a converged model.
@KaleidoZhouYN Here is my solver:

net: "sphereface_model.prototxt"
base_lr: 0.01
lr_policy: "multistep"
gamma: 0.1
stepvalue: 160000
stepvalue: 240000
stepvalue: 280000
max_iter: 280000
display: 20
momentum: 0.9
weight_decay: 0.0005
snapshot: 2000
snapshot_prefix: "weights/ms_res64_lambda10"
solver_mode: GPU

Batch_size is set to 256.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@hardegg direct way to set the learning rate is to look at the backward diff of Margin_innerproduct layer,since the norm of the param is always around 1.
The norm of the feature(output of the fc5 layer)is also very important because it only converges when feature norm is quite large.(The paper "L2 constrain Normalization" is helpful on this).

from sphereface.

BeginnerW avatar BeginnerW commented on August 23, 2024

@wy1iu I got the same LFW accuracy about 99.2 both on large dataset and small dataset. What might be the problem? what's accuracy did you get on large dataset (MS dataset) ? Thanks!

from sphereface.

wy1iu avatar wy1iu commented on August 23, 2024

@BeginnerW Large datasets such as MS-1M have a lot of overlapping labels with LFW, so you should not directly train on MS and test on LFW. FYI, if you train on MS-1M directly without removing the overlapping identities, you can get incredibly high accuracy like 99.7.

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@wy1iu Yes, it's also what I am expecting to see. By using MS-1M on center-face, I got 99.63% accuracy on LFW with the original 27 layer networks. For A-Softmax, I am expecting to see better result. But right now, I am still stuck at the networks cannot converge. Any instruction? Could you explain how you trained on MS-1M? Thanks.

from sphereface.

BeginnerW avatar BeginnerW commented on August 23, 2024

@wy1iu I trained on the clean list with less than 4 millions samples which was released by LightenCNN's author. I don't why training on large dataset can not achieve higher accuracy than on small dataset. Using the caffemodel trained on small dataset as large dataset training's initial model, is that a problem? I hope you could give me some suggestions.

from sphereface.

wy1iu avatar wy1iu commented on August 23, 2024

@hardegg Original softmax loss trainied on MS-1M could also easily give you very high accuracy. For A-softmax, you can consider three things: 1) use longer iteration 2) try smaller lambda (but not too small) 3) use fine-tuning trick (that means you should first train your model on MS-1M using original softmax loss and then use A-Softmax to finetune)

@BeginnerW If you use large dataset to finetune, you should first pretrain your model on the same dataset instead of the small one. Then finetune with A-softmax (iteration should be long enough).

from sphereface.

hardegg avatar hardegg commented on August 23, 2024

@wy1iu Yes correct. Directly using softmax + MS-1M can already reach good accuracy on LFW. However, when you test the model in real applications, the accuracy is not that good any more. But if you train models with center loss, better accuracy can be reached on LFW, plus you will get much better accuracy when dealing with real applications. Now I think sphereface should give better result on larger training dataset.

Thanks for your suggestion. For 2) smaller lambda, can I know how small it can be, say 50, 100? Do you mind sharing the training log? In fact I've trained both, from scratch and fine tuning. For fine tuning, I use SINGLE first and it converged very easily. But when doing fine tuning it fell into softmax_loss=87.35 after a while. For training from scratch, it cannot converge if lambda is small. But if lambda is very big, say 10000, it converges but it's much more like the original softmax.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@wy1iu Hi,below is my test on LFW,obviously the cosine distance peak of the same person is around 0.7,while trained with center_loss can rich to 0.8
lfw
I want to know if this correspond to your 99.7% accuracy model?

from sphereface.

ctgushiwei avatar ctgushiwei commented on August 23, 2024

@KaleidoZhouYN how to draw this pic test on LFW?

from sphereface.

vzhangmeng726 avatar vzhangmeng726 commented on August 23, 2024

@KaleidoZhouYN great !!!! Can you show more details how to trian 28-layer-ResFace-Network, ,like learning rate,weight_decay and iteration of the pre-train model? can you share model.prototxt and solver.prototxt,. Thanks a lot.

from sphereface.

ysc703 avatar ysc703 commented on August 23, 2024

@KaleidoZhouYN Can you show more details and your prototxts?
Thanks a lot.

from sphereface.

nyyznyyz1991 avatar nyyznyyz1991 commented on August 23, 2024

@KaleidoZhouYN Amazing! can you share the 28-layer mode prototxt and solver.prototxt? I have been struggling to train 64 layer resnet with large datasets. And what is the batch_size you set for 28 layer resnet? 256 * 2gpu? or 128*4gpu?
Thanks a lot.

from sphereface.

Zhongdao avatar Zhongdao commented on August 23, 2024

@KaleidoZhouYN Great job! I'll be glad if we can discuss more about face recognition on wechat, if you want. Here is my account: 13051902595.

from sphereface.

HaoLiuHust avatar HaoLiuHust commented on August 23, 2024

@KaleidoZhouYN have you clean the MS_CELEB_1M dataset? it seems the dataset has some overlap with LFW

from sphereface.

XWalways avatar XWalways commented on August 23, 2024

@KaleidoZhouYN I trained the model on msceleb dataset by setting m=1,base=1000,gamma=0.000025,power=35,lambda_min=0,iteration=0 , got loss=0.871851,accuracy=0.8525,then I finetuned by setting m=4,lr_mult(in fc6)=10,decay_mult(in fc6)=10,what's more,I changed "fc6" to "fc7"(that means I did't use the parameters in the caffemodel),but I failed in the end.Why?

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@XWalways A-softmax is good for generalization but it doesn't mean you can do everything you want such as set m=4 and lambda_min = 0.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@HaoLiuHust No,we didn't.

from sphereface.

HaoLiuHust avatar HaoLiuHust commented on August 23, 2024

@KaleidoZhouYN then the accuracy may be higher than it really is

from sphereface.

XWalways avatar XWalways commented on August 23, 2024

@KaleidoZhouYN m=4 means type:"QUADRUPLE",m=1 means type:"SINGLE".You said that we should train with m=1 and finetune with m=4.I want to know how to change parameters when finetuning.Thanks

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@XWalways set lambda_min = 0 is a bad choice

from sphereface.

XWalways avatar XWalways commented on August 23, 2024

@KaleidoZhouYN But How should I modify the parameters while finetuning? I have tried many times,but all failed.Thanks a lot.

from sphereface.

HaoLiuHust avatar HaoLiuHust commented on August 23, 2024

@KaleidoZhouYN Thanks, in your training, the alignment method is https://github.com/happynear/FaceVerification/dataset/CK/align_CK.py or develop a new method?

from sphereface.

johnnysclai avatar johnnysclai commented on August 23, 2024

@KaleidoZhouYN Great works. Did you try to train on the CASIA-WebFace dataset with center loss? I am wondering what the accuracy will it be.

from sphereface.

KaleidoZhouYN avatar KaleidoZhouYN commented on August 23, 2024

@HaoLiuHust well,On MS_celeb_1M,we use MTCNN,but on our own dataset,the landmark will be different,not use MTCNN,If you are concerning about the alignment,please see:
https://github.com/sciencefans/RSA-for-object-detection
by sensetime and the result is fantastic and much better than ours.- -|

from sphereface.

HaoLiuHust avatar HaoLiuHust commented on August 23, 2024

@KaleidoZhouYN Thank you for your warmly reply, could you help to point out where is the alignment part? is it in get_rect_from_pts.m?

from sphereface.

HaoLiuHust avatar HaoLiuHust commented on August 23, 2024

@KaleidoZhouYN found it, thanks

from sphereface.

ctgushiwei avatar ctgushiwei commented on August 23, 2024

@Zhongdao @KaleidoZhouYN @wy1iu @wuqiangch do you have trained res20 use asoftmax?and what the accuracy test on LFW? I train a model only achieve 99.1% use m=2.lambda_min=3

from sphereface.

JoyLuo avatar JoyLuo commented on August 23, 2024

@KaleidoZhouYN
Do you change the iteration value manually when finetune the net in QUAFRUPLE? For example, the iteration of pretrained model with SINGLE type is 28000, so the iteration value should be 28000 when finetune the net in QUADRUPLE.

from sphereface.

MengWangTHU avatar MengWangTHU commented on August 23, 2024

@KaleidoZhouYN you say "Take care that fine-tune will automatically set the parameter "iteration" to 0". Do you mean when finetuning, the parameter "iteration" (who's default value is 0) should also be changed?
I know what other parameters mean, lile gamma, lambda, but I do not know what dose this parameter mean.

from sphereface.

wangce888 avatar wangce888 commented on August 23, 2024

@KaleidoZhouYN how to set the parameters when finetuning, I awayls get bad result

from sphereface.

yxchng avatar yxchng commented on August 23, 2024

@Zhongdao @KaleidoZhouYN Hi I see that your discussion seems to suggest using lambda_min=10 and m=4? Is it that true? And that lambda_min=5 will not work?

from sphereface.

shineway14 avatar shineway14 commented on August 23, 2024

@KaleidoZhouYN how to change "iteration" or "lambda" when finetuning,Thanks

from sphereface.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.