GithubHelp home page GithubHelp logo

happynear / amsoftmax Goto Github PK

View Code? Open in Web Editor NEW
482.0 35.0 130.0 1.07 MB

A simple yet effective loss function for face verification.

License: MIT License

MATLAB 100.00%
deep-learning face-recognition loss-functions metric-learning softmax

amsoftmax's Introduction

Additive Margin Softmax for Face Verification

by Feng Wang, Weiyang Liu, Haijun Liu, Jian Cheng

The paper is available as a technical report at arXiv.

Introduction

FeatureVis

In this work, we design a new loss function which merges the merits of both NormFace and SphereFace. It is much easier to understand and train, and outperforms the previous state-of-the-art loss function (SphereFace) by 2-5% on MegaFace.

Citation

If you find AM-Softmax useful in your research, please consider to cite:

@article{Wang_2018_amsoftmax,
  title = {Additive Margin Softmax for Face Verification},
  author = {Wang, Feng and Liu, Weiyang and Liu, Haijun and Cheng, Jian},
  journal = {arXiv preprint arXiv:1801.05599},
  year = {2018}
}

Training

Requirements: My Caffe version https://github.com/happynear/caffe-windows. This version can also be compiled in Linux.

The prototxt file is in ./prototxt. The batch size is set to 256. If your GPU's memory is not sufficient enough, you may set iter_size: 2 in face_solver.prototxt and batch_size: 128 in face_train_test.prototxt.

The dataset used for training is CASIA-Webface. We removed 59 identities that are duplicated with LFW (17) and MegaFace Set 1 (42). This is why the final inner-product layer's output is 10516. The list of the duplicated identities can be found in https://github.com/happynear/FaceDatasets.

All other settings are the same with SphereFace. Please refer to the details in SphereFace's repository.

PS: If you want to try the margin scheme described in ArcFace, you may try to transplant this layer in the experiment branch of my Caffe repository. LabelSpecificHardMarginForward() is the kernel function for cos(theta+m).

Model and Training Log

Feature normalized, s=30, m=0.35: OneDrive, Baidu Yun .

Results

See our arXiv technical report.

3rd-Party Re-implementation

amsoftmax's People

Contributors

happynear avatar wy1iu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amsoftmax's Issues

CASIA-Webface dataset download link

@happynear I can't get any download link about CASIA-Webface, neither cleaned CASIA Dataset nor dirty.
The official download link and Baiduyun links that I found on the Internet cannot be accessed now,Can you give me a download link, thank you!

target logit and its curves

hi,I read your paper and your target logit curves code ,I feel puzzled,your paper say that Wf is also called as the target logit ,but I feel that your target logit curves is not correspond with the definied Wf,it seems not to think about the f.

The way of using AMSoftmax

Dear AMSoftmax team,
Thanks for the great work, I've checked out your paper and prepare to try my owndata on the repo, however I'm a bit confused, would you mind telling me the relation between Amsoftmax and Sphereface? I noticed AMSoftmax seems to be the latest result on your experiment, but there's no many instructions of manipulation . As far as I recognized , should the only difference between Sphereface and AMSoftmax be the prototxt file? besides I can not only obtain AMSoftmax repo, but to keep Sphereface repo and follow the steps to train?

flip in face_deploy_mirror_normalize.prototxt doesn't work properly

I set an image to AMSoftmax, which net prototxt is (face_deploy_mirror_normalize.prototxt) and weight is (your pretrained weights) . after loading weights I put an image to net input and run forward() method on it. Then I wanted to explore how the flip layer works but after plot the output of flip_data blobs I see something goes wrong, the flip layer has flipped data vertically(I mean up down) !! is it Okay?
result of code:
selection_027

The code is something like below:


net=caffe.Net(
        'face_deploy_mirror_normalize.prototxt',
        'face_train_test_iter_30000.caffemodel',
        caffe.TEST);

def return_layer_name(layer_name,i):
    output=net.blobs[layer_name].data[i]
    output=np.swapaxes(output,0,2)
    return output


img=caffe.io.load_image('Anthony_Hopkins_0002.jpg')
img=caffe.io.resize(img,(96,112))
img=np.expand_dims(img,0)
img=np.swapaxes(img,1,3)
net.blobs['data'].data[...]=img
net.forward()
output=net.blobs['norm1'].data[0]
out1=return_layer_name('data_input_0_split_0',0)
out2=return_layer_name('flip_data',0)
fig = plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(out1)
plt.subplot(1,2,2)
plt.imshow(out2)

Alignment of the image in realtime

I want to test caffemodel on the the real world problem, so I use mtcnn landmarks and align image like the below:
`

 Mat transform(Mat image,                 //cropped face image
                       vector<Point2f> dst)  //dst are the landmark of the face
 {
  float image_height=96.0/image.cols;
  float image_width=112.0/image.rows;
  for (int i = 0; i < dst.size(); ++i)
  {
      dst[i].y*=image_height;          //in this line I will scale the points to the new size of image(96,112)
      dst[i].x*=image_width;
  }
  cv::resize(image,image,Size(96,112));
  vector<Point2f> src;
  src.push_back(Point2f(30.2946,51.6963));
  src.push_back(Point2f(65.5318,51.5014));
  src.push_back(Point2f(48.0252,71.7366));
  src.push_back(Point2f(33.5493,92.3655));
  src.push_back(Point2f(62.7299,92.2041));


  cv::Mat R = cv::estimateRigidTransform(dst,src,false);
  Mat out;
  cv::warpAffine(image,out,R,Size(96,112));
  return out;
}

`

what I got is something like the below image

image

As you can see there is a black area in the top and rights side of the image, So I'm confused that is this normal?

about the weight norm

hi @happynear,
i read the code of innerproduct, find that, you only norm the weights in the forward pass. so why it does not need corresponding bp?

thanks.

Training without alignment

Dear @happynear, first of all thanks for your work and uploaded results! I would like to ask about alignment step, is it really important to get good performance? I have not tried your code ( will do it this week ), but tried ResNet-18 with VGG2 without alignment step. SoftMax and CenterLoss gave about 90% both, CenterLoss also provided much better localization, but surprisingly ArcFace result was 70% only, I will try CosFace this week but I expect more or less the same. Did you try to train something without alignment?
Thank you! I will post my result with CosFace

About pretrain model!

Hi thanks for you amsoftmax:
Could you tell us if the ResNet20 is a pretrained model? And if it is pretrained with the amsoftmax!

iter_size问题

你好,对于slover配置文件iter_size参数我在网上查了相关资料,caffe官方并没有这个参数的说明,我的问题是,如果设置这个参数相当于调整了batch_size,那么迭代次数是不是应该减少呢,因为我设置了这个参数为n后,训练的速度相应的变慢了n倍,求解答

draw_sphere

您好,我想得到不同损失函数在mnist数据集上球面分布的图,自己使用不同的loss function去训练,得到的相应的模型图,但是还是无法画图?您的那个jet.mat文件是如何生成的?谢谢

How to debug margin and scale?

Hi, thanks for your great job. I wonder how I can debug the value of margin and scale to get better result.I use the default setting(m=0.35, scale=30)on my face recognition dataset, the final training loss is about 3 and it can't decrease. So I come here to ask the quesion, thank you~

Setting of deploy

Should we save the norm1 layer for deploy? or just get the output from fc5.

关于预训练模型在LFW数据集上准确率的问题

我使用sphereface里的evaluation.m文件在LFW数据集上测试您提供的模型face_train_test_iter_30000.caffemodel,输出结果如下:
fold ACC

1 61.17%
2 60.33%
3 64.00%
4 60.00%
5 59.50%
6 58.83%
7 59.17%
8 58.33%
9 62.33%
10 65.67%

AVE 60.93%
是我哪里操作失误吗?准确率为什么跟您论文里描述的差了那么多呢?

support caffe.binding?

Great work, appreciate it. Can I use the same caffe.binding project without any modification?

Test Problem

I train my model[Webface] in parms of s=30, m=0.35. The Result in lfw is 98.53%. I try to change parms but it get worse result. Is the parms is the best in your tests? Can data strength help me improve the effect? thanks for your advice

when set a mini-batch to model input the time of calculation won't decreade

Hi, first of all thanks to great job,
I compile your caffe and want to test per-trained weight so I put an image to model input's and it give me 512 float array in 0.4 second on 1080ti gpu(is this ok?) after that I want to set a mini-batch for model input but for 10 images it give me 3.6 second that just slightly faster than when we set one image.

how does the loss converge

Does the AMS loss have the similar converge curve with softmax loss? In my exps, the AMS loss (set m=1) changes little during training, even after lots of iters.

Have you ever tried batch normalization?

@happynear

I tried to add batch normalization on your modified resnet20, but the loss became 87.3365. As far as I know, BN helps learning more quickly, Is it possible to add batch normalization with amsoftmax?

Here is the prototxt

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 160
      dim: 160
    }
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_1/bn"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_1/scale"
  type: "Scale"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_1"
  type: "PReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_2/bn"
  type: "BatchNorm"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_2/scale"
  type: "Scale"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_2"
  type: "PReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "conv1_3"
  type: "Convolution"
  bottom: "conv1_2"
  top: "conv1_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_3/bn"
  type: "BatchNorm"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_3/scale"
  type: "Scale"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_3"
  type: "PReLU"
  bottom: "conv1_3"
  top: "conv1_3"
}
layer {
  name: "res1_3"
  type: "Eltwise"
  bottom: "conv1_1"
  bottom: "conv1_3"
  top: "res1_3"
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "res1_3"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_1/bn"
  type: "BatchNorm"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_1/scale"
  type: "Scale"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_1"
  type: "PReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_2/bn"
  type: "BatchNorm"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_2/scale"
  type: "Scale"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_2"
  type: "PReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "conv2_3"
  type: "Convolution"
  bottom: "conv2_2"
  top: "conv2_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_3/bn"
  type: "BatchNorm"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_3/scale"
  type: "Scale"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_3"
  type: "PReLU"
  bottom: "conv2_3"
  top: "conv2_3"
}
layer {
  name: "res2_3"
  type: "Eltwise"
  bottom: "conv2_1"
  bottom: "conv2_3"
  top: "res2_3"
}
layer {
  name: "conv2_4"
  type: "Convolution"
  bottom: "res2_3"
  top: "conv2_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_4/bn"
  type: "BatchNorm"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_4/scale"
  type: "Scale"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_4"
  type: "PReLU"
  bottom: "conv2_4"
  top: "conv2_4"
}
layer {
  name: "conv2_5"
  type: "Convolution"
  bottom: "conv2_4"
  top: "conv2_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_5/bn"
  type: "BatchNorm"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_5/scale"
  type: "Scale"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_5"
  type: "PReLU"
  bottom: "conv2_5"
  top: "conv2_5"
}
layer {
  name: "res2_5"
  type: "Eltwise"
  bottom: "res2_3"
  bottom: "conv2_5"
  top: "res2_5"
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "res2_5"
  top: "conv3_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_1/bn"
  type: "BatchNorm"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_1/scale"
  type: "Scale"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_1"
  type: "PReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_2/bn"
  type: "BatchNorm"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_2/scale"
  type: "Scale"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_2"
  type: "PReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_3/bn"
  type: "BatchNorm"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_3/scale"
  type: "Scale"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_3"
  type: "PReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "res3_3"
  type: "Eltwise"
  bottom: "conv3_1"
  bottom: "conv3_3"
  top: "res3_3"
}
layer {
  name: "conv3_4"
  type: "Convolution"
  bottom: "res3_3"
  top: "conv3_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_4/bn"
  type: "BatchNorm"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_4/scale"
  type: "Scale"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_4"
  type: "PReLU"
  bottom: "conv3_4"
  top: "conv3_4"
}
layer {
  name: "conv3_5"
  type: "Convolution"
  bottom: "conv3_4"
  top: "conv3_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_5/bn"
  type: "BatchNorm"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_5/scale"
  type: "Scale"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_5"
  type: "PReLU"
  bottom: "conv3_5"
  top: "conv3_5"
}
layer {
  name: "res3_5"
  type: "Eltwise"
  bottom: "res3_3"
  bottom: "conv3_5"
  top: "res3_5"
}
layer {
  name: "conv3_6"
  type: "Convolution"
  bottom: "res3_5"
  top: "conv3_6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_6/bn"
  type: "BatchNorm"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_6/scale"
  type: "Scale"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_6"
  type: "PReLU"
  bottom: "conv3_6"
  top: "conv3_6"
}
layer {
  name: "conv3_7"
  type: "Convolution"
  bottom: "conv3_6"
  top: "conv3_7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_7/bn"
  type: "BatchNorm"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_7/scale"
  type: "Scale"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_7"
  type: "PReLU"
  bottom: "conv3_7"
  top: "conv3_7"
}
layer {
  name: "res3_7"
  type: "Eltwise"
  bottom: "res3_5"
  bottom: "conv3_7"
  top: "res3_7"
}
layer {
  name: "conv3_8"
  type: "Convolution"
  bottom: "res3_7"
  top: "conv3_8"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_8/bn"
  type: "BatchNorm"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_8/scale"
  type: "Scale"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_8"
  type: "PReLU"
  bottom: "conv3_8"
  top: "conv3_8"
}
layer {
  name: "conv3_9"
  type: "Convolution"
  bottom: "conv3_8"
  top: "conv3_9"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_9/bn"
  type: "BatchNorm"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_9/scale"
  type: "Scale"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_9"
  type: "PReLU"
  bottom: "conv3_9"
  top: "conv3_9"
}
layer {
  name: "res3_9"
  type: "Eltwise"
  bottom: "res3_7"
  bottom: "conv3_9"
  top: "res3_9"
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "res3_9"
  top: "conv4_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_1/bn"
  type: "BatchNorm"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_1/scale"
  type: "Scale"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_1"
  type: "PReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_2/bn"
  type: "BatchNorm"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_2/scale"
  type: "Scale"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_2"
  type: "PReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_3/bn"
  type: "BatchNorm"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_3/scale"
  type: "Scale"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_3"
  type: "PReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "res4_3"
  type: "Eltwise"
  bottom: "conv4_1"
  bottom: "conv4_3"
  top: "res4_3"
}
layer {
  name: "fc5"
  type: "InnerProduct"
  bottom: "res4_3"
  top: "fc5"
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

high similarity between abnormal face images

Hi, I found that the model trained with AMS may get higher similarity bettem a pair of abnormal enroll and probe images (the probe is low qulaity, wrong aligned, or even not a face, the enroll is not a good id photo). The similarity may be around 0.4 or even higher while the ones trained with softmax may be just around 0. So have you ever met the same problems? Is it because that the margin push the feature space much compact than softmax?
Thanks!

how to set the m when the feature without norm?

Hi, your paper shows the result of AM-Softmax w/o FN with the m = 0.35 and 0.4.
(1).with FN : Fai = s * (cos(theta) - m) s=30, m=0.35
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 30
}
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

(2).w/o FN : s not needed, Fai = ||x|| * cos(theta) - m, still use m = 0.35?
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: false
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

Can you show your prototxt and trainning log? thx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.