happynear / amsoftmax Goto Github PK

View Code? Open in Web Editor NEW

482.0 35.0 130.0 1.07 MB

A simple yet effective loss function for face verification.

License: MIT License

MATLAB 100.00%

deep-learning face-recognition loss-functions metric-learning softmax

amsoftmax's Introduction

Additive Margin Softmax for Face Verification

by Feng Wang, Weiyang Liu, Haijun Liu, Jian Cheng

The paper is available as a technical report at arXiv.

Introduction

In this work, we design a new loss function which merges the merits of both NormFace and SphereFace. It is much easier to understand and train, and outperforms the previous state-of-the-art loss function (SphereFace) by 2-5% on MegaFace.

Citation

If you find AM-Softmax useful in your research, please consider to cite:

@article{Wang_2018_amsoftmax,
  title = {Additive Margin Softmax for Face Verification},
  author = {Wang, Feng and Liu, Weiyang and Liu, Haijun and Cheng, Jian},
  journal = {arXiv preprint arXiv:1801.05599},
  year = {2018}
}

Training

Requirements: My Caffe version https://github.com/happynear/caffe-windows. This version can also be compiled in Linux.

The prototxt file is in ./prototxt. The batch size is set to 256. If your GPU's memory is not sufficient enough, you may set iter_size: 2 in face_solver.prototxt and batch_size: 128 in face_train_test.prototxt.

The dataset used for training is CASIA-Webface. We removed 59 identities that are duplicated with LFW (17) and MegaFace Set 1 (42). This is why the final inner-product layer's output is 10516. The list of the duplicated identities can be found in https://github.com/happynear/FaceDatasets.

All other settings are the same with SphereFace. Please refer to the details in SphereFace's repository.

PS: If you want to try the margin scheme described in ArcFace, you may try to transplant this layer in the experiment branch of my Caffe repository. LabelSpecificHardMarginForward() is the kernel function for cos(theta+m).

Model and Training Log

Feature normalized, s=30, m=0.35: OneDrive, Baidu Yun .

Results

See our arXiv technical report.

3rd-Party Re-implementation

TensorFlow: code by Joker316701882.
TensorFlow: code by yule-li.

amsoftmax's People

Contributors

Stargazers

Watchers

Forkers

objectdetection keyky dcnhan locussam runauto felixmonkey chrgo pustar yesyu tzhang2014 qianjide cheyuanxiaozi starstylesky nbadalls denethor1997 arasharchor xialuxi ctgushiwei farmingyard 10183308 fireae liyuanyaun houjun-data yaqilyu lyy5 gzuvisionspace alexliyang wyc2015fq baifanysu kakacynic yemenr sunxingxingtf kk52099 goodluckcwl lji72 zch-90 arestorres naiveghost fireeyesgit 1093842024 armstrongyang jackcc fishman2008 lbwang2006 yogsin madongyu nerddd superhero1991 eelva stoneyang-face wuyuanyuan1990 chaoso kevin2599 shlpu lqs19881030 yongzhengqi jimeffry suzhoushr ahuirecome clscy zhaomonica jackywang-001 tandychao mikaelhu0823 liaoheping zumbalamambo rain2008204 afcarl joeysu tinyloop dreadlord1984 lilysys limingda92 gds101054108 chanbluky yuhaoluo clhne zhengshunjie liushuan dajidali010 soccergame ieyer amena6490 zp1018 heshenghuan taotaoyuhust wencoast sunjunlishi ailihong xinxin12345 xiaoye77 dev233 leethony msnqqer tensorflow-pool onlynata normalct xysong1201 qiaoxie robosina

amsoftmax's Issues

Where is fc6 in face_train_test.prototxt?

I only see the layer name of fc6_L2 in face_train_test.prototxt.

the picture size is 96*96？？？

I put inner_product_layer and LabelSpecificAdd together,but it gets worse result.

Hi,thanks for your work. Did you ever try to put it together? I think put it together shouldn't make it worse. But , when I use it to train lenet(dataset:mnist) , it actually get worse result.

CASIA-Webface dataset download link

@happynear I can't get any download link about CASIA-Webface, neither cleaned CASIA Dataset nor dirty.
The official download link and Baiduyun links that I found on the Internet cannot be accessed now，Can you give me a download link, thank you!

target logit and its curves

hi,I read your paper and your target logit curves code ,I feel puzzled,your paper say that Wf is also called as the target logit ,but I feel that your target logit curves is not correspond with the definied Wf,it seems not to think about the f.

The way of using AMSoftmax

Dear AMSoftmax team,
Thanks for the great work, I've checked out your paper and prepare to try my owndata on the repo, however I'm a bit confused, would you mind telling me the relation between Amsoftmax and Sphereface? I noticed AMSoftmax seems to be the latest result on your experiment, but there's no many instructions of manipulation . As far as I recognized , should the only difference between Sphereface and AMSoftmax be the prototxt file? besides I can not only obtain AMSoftmax repo, but to keep Sphereface repo and follow the steps to train?

flip in face_deploy_mirror_normalize.prototxt doesn't work properly

I set an image to AMSoftmax, which net prototxt is (face_deploy_mirror_normalize.prototxt) and weight is (your pretrained weights) . after loading weights I put an image to net input and run forward() method on it. Then I wanted to explore how the flip layer works but after plot the output of flip_data blobs I see something goes wrong, the flip layer has flipped data vertically(I mean up down) !! is it Okay?
result of code:

The code is something like below:


net=caffe.Net(
        'face_deploy_mirror_normalize.prototxt',
        'face_train_test_iter_30000.caffemodel',
        caffe.TEST);

def return_layer_name(layer_name,i):
    output=net.blobs[layer_name].data[i]
    output=np.swapaxes(output,0,2)
    return output


img=caffe.io.load_image('Anthony_Hopkins_0002.jpg')
img=caffe.io.resize(img,(96,112))
img=np.expand_dims(img,0)
img=np.swapaxes(img,1,3)
net.blobs['data'].data[...]=img
net.forward()
output=net.blobs['norm1'].data[0]
out1=return_layer_name('data_input_0_split_0',0)
out2=return_layer_name('flip_data',0)
fig = plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(out1)
plt.subplot(1,2,2)
plt.imshow(out2)

Alignment of the image in realtime

I want to test caffemodel on the the real world problem, so I use mtcnn landmarks and align image like the below:
`

 Mat transform(Mat image,                 //cropped face image
                       vector<Point2f> dst)  //dst are the landmark of the face
 {
  float image_height=96.0/image.cols;
  float image_width=112.0/image.rows;
  for (int i = 0; i < dst.size(); ++i)
  {
      dst[i].y*=image_height;          //in this line I will scale the points to the new size of image(96,112)
      dst[i].x*=image_width;
  }
  cv::resize(image,image,Size(96,112));
  vector<Point2f> src;
  src.push_back(Point2f(30.2946,51.6963));
  src.push_back(Point2f(65.5318,51.5014));
  src.push_back(Point2f(48.0252,71.7366));
  src.push_back(Point2f(33.5493,92.3655));
  src.push_back(Point2f(62.7299,92.2041));


  cv::Mat R = cv::estimateRigidTransform(dst,src,false);
  Mat out;
  cv::warpAffine(image,out,R,Size(96,112));
  return out;
}

what I got is something like the below image

As you can see there is a black area in the top and rights side of the image, So I'm confused that is this normal?

about the weight norm

hi @happynear,
i read the code of innerproduct, find that, you only norm the weights in the forward pass. so why it does not need corresponding bp?

thanks.

face_train_test_wo_fn.prototxt

不好意思，打扰啦

Training without alignment

Dear @happynear, first of all thanks for your work and uploaded results! I would like to ask about alignment step, is it really important to get good performance? I have not tried your code ( will do it this week ), but tried ResNet-18 with VGG2 without alignment step. SoftMax and CenterLoss gave about 90% both, CenterLoss also provided much better localization, but surprisingly ArcFace result was 70% only, I will try CosFace this week but I expect more or less the same. Did you try to train something without alignment?
Thank you! I will post my result with CosFace

About pretrain model！

Hi thanks for you amsoftmax:
Could you tell us if the ResNet20 is a pretrained model？ And if it is pretrained with the amsoftmax！

iter_size问题

你好，对于slover配置文件iter_size参数我在网上查了相关资料，caffe官方并没有这个参数的说明，我的问题是，如果设置这个参数相当于调整了batch_size,那么迭代次数是不是应该减少呢，因为我设置了这个参数为n后，训练的速度相应的变慢了n倍，求解答

draw_sphere

您好，我想得到不同损失函数在mnist数据集上球面分布的图，自己使用不同的loss function去训练，得到的相应的模型图，但是还是无法画图？您的那个jet.mat文件是如何生成的？谢谢

How to debug margin and scale?

Hi, thanks for your great job. I wonder how I can debug the value of margin and scale to get better result.I use the default setting(m=0.35, scale=30)on my face recognition dataset, the final training loss is about 3 and it can't decrease. So I come here to ask the quesion, thank you~

Setting of deploy

Should we save the norm1 layer for deploy? or just get the output from fc5.

Compare AMSoftmax with ArcFace

I didn't find any comparison between AMSoftmax and ArcFace on the internet, could you tell me your result ?

关于预训练模型在LFW数据集上准确率的问题

我使用sphereface里的evaluation.m文件在LFW数据集上测试您提供的模型face_train_test_iter_30000.caffemodel，输出结果如下：
fold ACC

1 61.17%
2 60.33%
3 64.00%
4 60.00%
5 59.50%
6 58.83%
7 59.17%
8 58.33%
9 62.33%
10 65.67%

AVE 60.93%
是我哪里操作失误吗？准确率为什么跟您论文里描述的差了那么多呢？

using ./prototxt/face_solver.prototxt & ./prototxt/face_train_test.prototxt but not converge

As I don't have enough GPU memory, I set iter_size: 8 in face_solver.prototxt and batch_size: 32 in face_train_test.prototxt, after 30000 iter it didn't converge on CASIA-Webface. I am confused that if I have done anything wrong.

Thanks in advanced.

what is inner-product layer's output depended , How to calculate。

support caffe.binding?

Great work, appreciate it. Can I use the same caffe.binding project without any modification?

关于LabelSpecificAdd问题

没问题了

Test Problem

I train my model[Webface] in parms of s=30, m=0.35. The Result in lfw is 98.53%. I try to change parms but it get worse result. Is the parms is the best in your tests? Can data strength help me improve the effect? thanks for your advice

What is gnap prototxt for?

there is a gnap prototxt

I am not sure what is this for? Any document?

thank you

when set a mini-batch to model input the time of calculation won't decreade

Hi, first of all thanks to great job,
I compile your caffe and want to test per-trained weight so I put an image to model input's and it give me 512 float array in 0.4 second on 1080ti gpu(is this ok?) after that I want to set a mini-batch for model input but for 10 images it give me 3.6 second that just slightly faster than when we set one image.

CASIA-Webface dataset download link

Hi @xisi789
Could you please again share the Casia webface database or Replay-Attack database for Face Anti Spoofing. I couldnot download it since the share file has expired

Can you send a verson of mnist dataset for amsoftmax, I modify a verson but when s=30 and m=0.4 it not converge and loss is also 16.811

I push my code here

5 of 10 epoch

how does the loss converge

Does the AMS loss have the similar converge curve with softmax loss? In my exps, the AMS loss (set m=1) changes little during training, even after lots of iters.

how to set the m and s when there are only 1000 identities?

thanks very much for your contribution. i am using gray images to train the model with only about 1000 identities, how need i set the m and s. both of them should be set smaller?

What‘s the performance of AMSoftmax on larger datasets？ MsCeleb， vggface2

Have you ever tried batch normalization?

@happynear

I tried to add batch normalization on your modified resnet20, but the loss became 87.3365. As far as I know, BN helps learning more quickly, Is it possible to add batch normalization with amsoftmax?

Here is the prototxt

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 160
      dim: 160
    }
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_1/bn"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_1/scale"
  type: "Scale"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_1"
  type: "PReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_2/bn"
  type: "BatchNorm"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_2/scale"
  type: "Scale"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_2"
  type: "PReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "conv1_3"
  type: "Convolution"
  bottom: "conv1_2"
  top: "conv1_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_3/bn"
  type: "BatchNorm"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_3/scale"
  type: "Scale"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_3"
  type: "PReLU"
  bottom: "conv1_3"
  top: "conv1_3"
}
layer {
  name: "res1_3"
  type: "Eltwise"
  bottom: "conv1_1"
  bottom: "conv1_3"
  top: "res1_3"
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "res1_3"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_1/bn"
  type: "BatchNorm"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_1/scale"
  type: "Scale"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_1"
  type: "PReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_2/bn"
  type: "BatchNorm"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_2/scale"
  type: "Scale"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_2"
  type: "PReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "conv2_3"
  type: "Convolution"
  bottom: "conv2_2"
  top: "conv2_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_3/bn"
  type: "BatchNorm"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_3/scale"
  type: "Scale"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_3"
  type: "PReLU"
  bottom: "conv2_3"
  top: "conv2_3"
}
layer {
  name: "res2_3"
  type: "Eltwise"
  bottom: "conv2_1"
  bottom: "conv2_3"
  top: "res2_3"
}
layer {
  name: "conv2_4"
  type: "Convolution"
  bottom: "res2_3"
  top: "conv2_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_4/bn"
  type: "BatchNorm"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_4/scale"
  type: "Scale"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_4"
  type: "PReLU"
  bottom: "conv2_4"
  top: "conv2_4"
}
layer {
  name: "conv2_5"
  type: "Convolution"
  bottom: "conv2_4"
  top: "conv2_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_5/bn"
  type: "BatchNorm"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_5/scale"
  type: "Scale"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_5"
  type: "PReLU"
  bottom: "conv2_5"
  top: "conv2_5"
}
layer {
  name: "res2_5"
  type: "Eltwise"
  bottom: "res2_3"
  bottom: "conv2_5"
  top: "res2_5"
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "res2_5"
  top: "conv3_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_1/bn"
  type: "BatchNorm"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_1/scale"
  type: "Scale"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_1"
  type: "PReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_2/bn"
  type: "BatchNorm"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_2/scale"
  type: "Scale"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_2"
  type: "PReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_3/bn"
  type: "BatchNorm"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_3/scale"
  type: "Scale"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_3"
  type: "PReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "res3_3"
  type: "Eltwise"
  bottom: "conv3_1"
  bottom: "conv3_3"
  top: "res3_3"
}
layer {
  name: "conv3_4"
  type: "Convolution"
  bottom: "res3_3"
  top: "conv3_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_4/bn"
  type: "BatchNorm"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_4/scale"
  type: "Scale"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_4"
  type: "PReLU"
  bottom: "conv3_4"
  top: "conv3_4"
}
layer {
  name: "conv3_5"
  type: "Convolution"
  bottom: "conv3_4"
  top: "conv3_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_5/bn"
  type: "BatchNorm"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_5/scale"
  type: "Scale"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_5"
  type: "PReLU"
  bottom: "conv3_5"
  top: "conv3_5"
}
layer {
  name: "res3_5"
  type: "Eltwise"
  bottom: "res3_3"
  bottom: "conv3_5"
  top: "res3_5"
}
layer {
  name: "conv3_6"
  type: "Convolution"
  bottom: "res3_5"
  top: "conv3_6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_6/bn"
  type: "BatchNorm"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_6/scale"
  type: "Scale"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_6"
  type: "PReLU"
  bottom: "conv3_6"
  top: "conv3_6"
}
layer {
  name: "conv3_7"
  type: "Convolution"
  bottom: "conv3_6"
  top: "conv3_7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_7/bn"
  type: "BatchNorm"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_7/scale"
  type: "Scale"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_7"
  type: "PReLU"
  bottom: "conv3_7"
  top: "conv3_7"
}
layer {
  name: "res3_7"
  type: "Eltwise"
  bottom: "res3_5"
  bottom: "conv3_7"
  top: "res3_7"
}
layer {
  name: "conv3_8"
  type: "Convolution"
  bottom: "res3_7"
  top: "conv3_8"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_8/bn"
  type: "BatchNorm"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_8/scale"
  type: "Scale"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_8"
  type: "PReLU"
  bottom: "conv3_8"
  top: "conv3_8"
}
layer {
  name: "conv3_9"
  type: "Convolution"
  bottom: "conv3_8"
  top: "conv3_9"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_9/bn"
  type: "BatchNorm"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_9/scale"
  type: "Scale"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_9"
  type: "PReLU"
  bottom: "conv3_9"
  top: "conv3_9"
}
layer {
  name: "res3_9"
  type: "Eltwise"
  bottom: "res3_7"
  bottom: "conv3_9"
  top: "res3_9"
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "res3_9"
  top: "conv4_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_1/bn"
  type: "BatchNorm"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_1/scale"
  type: "Scale"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_1"
  type: "PReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_2/bn"
  type: "BatchNorm"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_2/scale"
  type: "Scale"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_2"
  type: "PReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_3/bn"
  type: "BatchNorm"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_3/scale"
  type: "Scale"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_3"
  type: "PReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "res4_3"
  type: "Eltwise"
  bottom: "conv4_1"
  bottom: "conv4_3"
  top: "res4_3"
}
layer {
  name: "fc5"
  type: "InnerProduct"
  bottom: "res4_3"
  top: "fc5"
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

lfw准确率为99.73是如何得到的？使用的什么数据集！

您好，我有看您的实验都是使用的64层spereface结构进行训练的！那么请问，您有在别的结构上训练过么？我看arcface实用您的损失，arcface的结构能够达到99.7~99.8（仅适用vgg2或者ms）

auto margin

Could you give a short introduction to this https://github.com/happynear/AMSoftmax/tree/master/prototxt/auto, please?

high similarity between abnormal face images

Hi, I found that the model trained with AMS may get higher similarity bettem a pair of abnormal enroll and probe images (the probe is low qulaity, wrong aligned, or even not a face, the enroll is not a good id photo). The similarity may be around 0.4 or even higher while the ones trained with softmax may be just around 0. So have you ever met the same problems? Is it because that the margin push the feature space much compact than softmax?
Thanks!

how to set the m when the feature without norm?

Hi, your paper shows the result of AM-Softmax w/o FN with the m = 0.35 and 0.4.
(1).with FN : Fai = s * (cos(theta) - m) s=30, m=0.35
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 30
}
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

(2).w/o FN : s not needed, Fai = ||x|| * cos(theta) - m, still use m = 0.35?
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: false
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

Can you show your prototxt and trainning log? thx.