xialuxi / arcface-caffe Goto Github PK

View Code? Open in Web Editor NEW

279.0 23.0 125.0 8.92 MB

insightface-caffe

License: MIT License

C++ 45.42% Python 25.00% Cuda 29.58%

face-recognition face-detection face-pose face-landmark

arcface-caffe's Introduction

arcface-caffe

1、在caffe中实现arcface中的损失函数，参考cosinface的实现。

2、caffe工程，以及其它层的实现请参考：https://github.com/xialuxi/AMSoftmax

3、编译过程：

(1)下载https://github.com/xialuxi/AMSoftmax工程，修改好make.config的环境配置

(2)将cosin_add_m_layer.hpp拷贝到目录： ./caffe/include/caffe/layers/下

(3)将cosin_add_m_layer.cpp 、 cosin_add_m_layer.cu 拷贝到目录： ./caffe/src/caffe/layers/下

(4)根据proto文件，对应修改./caffe/src/caffe/proto/caffe.proto文件

(5)make -j

4、原理请参考：https://github.com/deepinsight/insightface

5、增加Combined Margin Loss 参考insightface的实现

6、增加mtcnn人脸检测python代码，根据c++代码改写，效果没有任何损失，模型与原始代码请参考：https://github.com/blankWorld/MTCNN-Accelerate-Onet

7、实际训练的时候，caffe的收敛速度慢而且困难，而mxnet的速度则比较快，具体原因还不清楚，解决方法参考：#7

8、增加 insightface的gpu实现代码

9、增加mxnet中带SE结构的网络模型转化为caffemodel的方法，几乎无精度损失。

10、增加人脸关键点检测损失函数wing_loss代码，以及人脸关键点和姿态估计的网络和预训练模型。论文：https://arxiv.org/abs/1711.06753v4

11、增加基于梯度均衡的损失函数，可以替换softmax，传送门：https://github.com/xialuxi/GHMLoss-caffe

12、更正cosin_add_m_layer.cu反向传播的计算，谢谢 @zhaokai5 的指正。

13、更新了新的关键点检测和人脸姿态估计模型, 模型大小不到1M.

14、增加SV-X-Softmax的实现.参考论文: 《Support Vector Guided Softmax Loss for Face Recognition》

15、增加AdaCos的实现，参考论文《AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations》

16，新的人脸检测算法（包含关键点检测）：RetinaFace 链接：https://github.com/xialuxi/insightface/tree/master/RetinaFace

17, 基于centernet的人脸检测算法以及关键点检测挺不错的，实现也很简单，看下效果图(resnet18)：只需要添加关键点回归的分支即可，参考：https://github.com/xingyizhou/CenterNet

18 、　本人最近在做k8s的kubeflow的分布式部署和训练平台，暂停有关人脸的更新，带来不便，敬请谅解！

arcface-caffe's People

Contributors

Stargazers

Watchers

Forkers

hexiangquan liyuanyaun lqs19881030 guitaryourself colacool cvtuge hezichuanqi jackywang-001 lochappy jerrybonjour xiaotie1005 theonly22 dreadlord1984 wzj5133329 lly8752 sunjunlishi lijuny balancewing changya1990 guo253 meitianjinbu hitmit123 5059 chanbluky liushuan shipeiyuan kk52099 zjz5250 mornydew lucy3589 otrewyi191 hungsing92 reach-xx w510056105 msjyyt yingning lazycrazyowl qidiso zgai cysin heyzhd zgsxwsdxg yueyihua iamweiweishi jianweilin shangyazhou zyg11 clhne developjoy wavelet2008 bingranhu leo-xxx zp1018 pighead1016 hxhh jacklongking fireae facex-team-for-learning laycoding laulian bruinxiong xiaoye77 aiyangyang963 superhero1991 lynnehuang10 colintaozhang zhdai asi-sx tobiccino lj0620 tengsz boozyguo hhy5277 shiyuan0806 ruiyoua kealthaswz wolfworld6 qaz734913414 we0091234 jiangxuehan juanjuankai ghost2020 1273545169 lhq0308 grow2008 msnqqer zghzdxs chenchaohui maxuehao tuq820 wuyx cavalleria gm19900510 xysong1201 zjmoo123 lfliu gmvidooly hubhup hu-novu perfyperfect

arcface-caffe's Issues

卷积层是否使用偏置 bias_term

作者你好，我看insightface官网里面的人脸识别arcface网络结构，卷积层中都不使用偏置，而你这边全部都使用，请问这个有什么区别吗？谢谢

cosin_add_m_layer的gpu实现？？？

您好，您有没有写cosin_add_m_layer.cu 文件呢？loss层的cpu计算和gpu计算对网络训练时间影响大吗？

编译Combined Margin Loss 出错？？？

您好，我刚刚下载了您的最新的Combined Margin Loss文件，但是在编译的时候报错了，您看是什么问题呢？

Severity Code Description Project File Line Suppression State
Error C2065 'arccos_x': undeclared identifier libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32
Severity Code Description Project File Line Suppression State
Error C2228 left of '.mutable_cpu_data' must have class/struct/union libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32

谢谢您！！

Optimizer

Hi,
what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?

I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model.
I'm using Adam and the results are following (dataset is CARS196 )

Adam fixed: 0.79
Adam AdaCos: 0.745
Adam AdaCos x2 bigger lr: 0.755

In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces.
Or we need more adaptive LR method for this problem.

对于多个关键点的训练（68,194），有人试过吗？

效果怎么样？

pose = pose * 90.0是出来的pose要旋转的意思吗？

landmark_and_pose/new_model/detect.py这个脚本里，pose = pose * 90.0是什么意思？是代表这个模型出来的人脸角度都需要旋转90度吗？

Could you provide the adacos_add_m_scale_layer.cu file ? thx

UMDFaces Dataset

Hi, thanks for your project, can you share the UMDFaces Dataset with me.

关键点检测网络问题

关键点检测的网络用的是可分离卷积，权重大小应该是filter_sizexfliter_sizexoutput_channel，
但实际上权重大小却是为input_channelxfilter_sizexfliter_sizexoutput_channel，类似普通的卷积。
我把这些可分离卷积改成了正常卷积，测试程序可运行，但结果不对，麻烦确认下。

caffe 训练速度仍然很慢，请问作者后来有发现这个问题吗，

@xialuxi ,谢谢您的回复，我上午看了adaface的论文的实现
用caffe 训练很慢的原因请问作者后来有发现吗，前向传播我觉得挺快的
我设置的参数是batch size=56 l两块1080 iter_size:6
/57597324-8428ad00-7581-11e9-9fa1-e2d72d0446f7.png)
12:49:24.228211 23509 solver.cpp:243] Iteration 0, loss = 24.1503
I0513 12:49:24.228235 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.839286
I0513 12:49:24.228257 23509 solver.cpp:259] Train net output #1: softmax_loss = 22.5511 (* 1 = 22.5511 loss)
I0513 12:49:24.228299 23509 sgd_solver.cpp:138] Iteration 0, lr = 0.01
I0513 12:54:50.852994 23509 solver.cpp:243] Iteration 100, loss = 20.3324
I0513 12:54:50.853057 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.928571
I0513 12:54:50.853081 23509 solver.cpp:259] Train net output #1: softmax_loss = 17.7896 (* 1 = 17.7896 loss)
I0513 12:54:50.923504 23509 sgd_solver.cpp:138] Iteration 100, lr = 0.01
I0513 13:00:39.458894 23509 solver.cpp:243] Iteration 200, loss = 18.8438
I0513 13:00:39.459019 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.964286
I0513 13:00:39.459044 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.4004 (* 1 = 16.4004 loss)
I0513 13:00:39.500185 23509 sgd_solver.cpp:138] Iteration 200, lr = 0.01
I0513 13:06:26.364652 23509 solver.cpp:243] Iteration 300, loss = 18.461
I0513 13:06:26.364759 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.946429
I0513 13:06:26.364783 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.8125 (* 1 = 16.8125 loss)

损失函数不下降？

我训练网络的batchsize为128，学习率从0.001开始降到0.00001，迭代了20000代，train loss在3~4之间一直动荡，降不下去。初始学习率改了也是这样。想问问你是怎么设置训练参数的？

我使用softmax损失，10177个id, 20万张图片，loss从开始50左右，训练10个epoch后下降到19左右就在一直震荡，请问你们训练时这么大的数据集loss能下降到多少？而且准确率也不会提升。

SV-X-Softmax的.cu文件

非常感谢您的复现，请问SV-X-Softmax有.cu文件文件吗，想试试复现效果

Could you share wingloss example

Could you share wingloss in train prototxt example like EuclideanLoss

layer {
  name: "loss"    
  type: "EuclideanLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
  loss_weight: 100
}

NormL2没有找到

arcface-caffe/AdaCos/adacos_add_m_scale.prototxt这个里面有一个NormL2层，好像工程里面没有这一层，这个可以在哪里找到？谢谢，

讨论：前向计算w做了归一化，而反向传播更新w的时候，是按照归一化前来计算梯度，是否会导致收敛困难？

AdaCos的问题

你的实现中theta_med是计算batch内所有样本在所有类别上的角度均值，论文中说是“the median of all corresponding classes’ angles”，我理解的是类似每个样本在标签类上的夹角，不知道对不对？

InnerProduct with normalize

Hi,

It seems that you missed to upload the innerProduct layer with normalized feature?

https://github.com/xialuxi/arcface-caffe/blob/master/cosin_add_m.prototxt#L828

您好，请问有.cu文件吗

以前和现在的 cosin_add_m scale 的实现区别是什么昵

以前的层，是两个层，一个添加m角度，一个添加尺度64或者128
然后新的arcface合并只有一层，没有scale 参数设置的值，但是这里和上面的实现的区别在哪里昵
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "temp_fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}

修改过后的addm 层
layer {
name: "adacos_add_m_scale"
type: "AdaCosAddmScale"
bottom: "fc6"
bottom: "label"
top: "fc6_margin_scale"
adacos_add_m_scale_param {
m: 0.5
num_classes: 10575
}
}

@xialuxi

softmax计算

hi xialuxi：
你好！
arcface loss的softmax计算和普通的softmax计算不一样：

分母部分把 yi 和 j 分开了，请问代码中这部分计算在哪里写呢？求导部分在哪里写呢？
非常感谢！

训练时loss损失一直不变，希望指点一下

模型deploy如下：
name: "ArcFace"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
interp_mode: AREA
interp_mode: CUBIC
interp_mode: LANCZOS4
}
mirror: True
crop_h: 128
crop_w: 128
#distort_param {
# brightness_prob: 0.5
# brightness_delta: 32
# contrast_prob: 0.5
# contrast_lower: 0.5
# contrast_upper: 1.5
# hue_prob: 0.5
# hue_delta: 18
# saturation_prob: 0.5
# saturation_lower: 0.5
# saturation_upper: 1.5
# random_order_prob: 0.
#}
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_train_9204cls_lmdb"
batch_size: 512
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
}
crop_h: 128
crop_w: 128
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_test_9204cls_lmdb"
batch_size: 2
backend: LMDB
}
}
############## CNN Architecture ###############
layer {
name: "data/bias"
type: "Bias"
bottom: "data"
top: "data/bias"
param {
lr_mult: 0
decay_mult: 0
}
bias_param {
filler {
type: "constant"
value: -128
}
}
}
################################################
layer {
name: "conv1"
type: "Convolution"
bottom: "data/bias"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 7
pad: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv1_bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1_scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv1_relu"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "pool1_1"
type: "Pooling"
bottom: "pool1"
top: "pool1_1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1_1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_1_bn"
type: "BatchNorm"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_1_scale"
type: "Scale"
bottom: "conv2_1"
top: "conv2_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_1_relu"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_2_bn"
type: "BatchNorm"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "conv2_2_scale"
type: "Scale"
bottom: "conv2_2"
top: "conv2_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_2_relu"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
##############################################
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_1_bn"
type: "BatchNorm"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_1_scale"
type: "Scale"
bottom: "conv3_1"
top: "conv3_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_1_relu"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_2_bn"
type: "BatchNorm"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_2_scale"
type: "Scale"
bottom: "conv3_2"
top: "conv3_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_2_relu"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "conv3_2"
top: "conv4_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_1_bn"
type: "BatchNorm"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_1_scale"
type: "Scale"
bottom: "conv4_1"
top: "conv4_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_1_relu"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_2_bn"
type: "BatchNorm"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_2_scale"
type: "Scale"
bottom: "conv4_2"
top: "conv4_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_2_relu"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
################################################
layer {
name: "conv5_1"
type: "Convolution"
bottom: "conv4_2"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv5_1_bn"
type: "BatchNorm"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_1_scale"
type: "Scale"
bottom: "conv5_1"
top: "conv5_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv5_1_relu"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv5_1"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#########################################
#########################################
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool3"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc1_bn"
type: "BatchNorm"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc1_scale"
type: "Scale"
bottom: "fc1"
top: "fc1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "fc1_relu"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc2_norm"
type: "NormalizeJin"
bottom: "fc2"
top: "fc2_norm"
norm_jin_param {
across_spatial: true
scale_filler {
type: "constant"
value: 1.0
}
channel_shared: true
}
}
############### Arc-Softmax Loss ##############

layer {
name: "fc6_changed"
type: "InnerProduct"
bottom: "fc2_norm"
top: "fc6"
inner_product_param {
num_output: 9204
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
####################################################
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.1
}
include {
phase: TRAIN
}
}

layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
include {
phase: TRAIN
}
}

######################################################
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
#bottom: "label"
#bottom: "data"
top: "softmax_loss"
loss_weight: 1
include {
phase: TRAIN
}
}

layer {
name: "Accuracy"
type: "Accuracy"
bottom: "fc6"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}

loss损失如下：
I0627 17:38:58.567371 6757 solver.cpp:224] Iteration 450 (2.13816 iter/s, 4.67691s/10 iters), loss = 87.3365
I0627 17:38:58.567402 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:38:58.567409 6757 sgd_solver.cpp:137] Iteration 450, lr = 0.00314
I0627 17:39:03.256306 6757 solver.cpp:224] Iteration 460 (2.13288 iter/s, 4.6885s/10 iters), loss = 87.3365
I0627 17:39:03.256340 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:03.256347 6757 sgd_solver.cpp:137] Iteration 460, lr = 0.00314
I0627 17:39:07.941520 6757 solver.cpp:224] Iteration 470 (2.13457 iter/s, 4.68478s/10 iters), loss = 87.3365
I0627 17:39:07.941551 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:07.941558 6757 sgd_solver.cpp:137] Iteration 470, lr = 0.00314
I0627 17:39:12.623337 6757 solver.cpp:224] Iteration 480 (2.13612 iter/s, 4.68139s/10 iters), loss = 87.3365
I0627 17:39:12.623456 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
请问该如何修改？

cosin_add_m_layer.cpp的疑问

看起来没发现loss的计算部分，损失函数层并没有出现在这些文件里？

Compile on ubuntu

Hi, I compiled your repository as your described, I did as follow

1.) I downloaded the repository ) https://github.com/xialuxi/AMSoftmax project
2.) ın the caffe windows directory I changed the make.config as described caffe installation
2.1) cd caffe-windows
2.2) for req in $(cat python/requirements.txt); do pip install --trusted-host pypi.python.org $req; done
2.3) cp Makefile.config.example Makefile.config
2.4)gedit Makefile.config
USE_CUDNN := 1
OPENCV_VERSION := 3
PYTHON_INCLUDE := /usr/include/python2.7
/usr/local/lib/python2.7/dist-packages/numpy/core/include
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
3 ) Copy cosin_add_m_layer.hpp to the directory: ./caffe/include/caffe/layers/
4) Copy cosin_add_m_layer.cpp and cosin_add_m_layer.cu to the directory: ./caffe/src/caffe/layers/
5) According to the proto file, modify the ./caffe/src/caffe/proto/caffe.proto file accordingly.
6) Also I copy combined_margin_layer.cpp,
combined_margin_layer.cu and combined_margin_layer.hpp in the https://github.com/gehaocool/CombinedMargin-caffe as descrbed in the step 3 and 4 places
6) make -j8
7.) make py
8.) make test -j8
But after these step when I run the following command
make runtest -j8
This failed for some layer testing. Therefore I did not run your repository. I miss something can you correct me in the compilation steps.

Thank you for your time..

关于CosinAddmLayer和combined_margin_layer的疑问

hi xialuxi:
看起来CosinAddmLayer就是arcloss(对应combined margin m1=1,m3=0的情况)，那为什么CosinAddmLayer考虑cos_t[i * dim + gt] > 1.0f和cos_t[i * dim + gt] <= threshold，而combined_margin_layer不需要呢？
请问最终训练采用的哪一个呢？
非常感谢！

landmark 检测不准

你好，我尝试了两个模型的关键点检测，出来的效果都不准。后来尝试加了人脸框检测，并未有所改善，请问是什么地方我忽略了吗？

Is it mandatory to use Nvidia GPU, can we use OpenCL instead?

训练出的模型的效果如何？

作者您好，请问训练出的模型的效果如何？和mxnet的代码相比，模型精度有损失吗？

accuracy_hat_arc不超过0.6，损失处于1.5

您好，我使用arcface训练四分类任务，最后的训练结果：
Iteration 285400(1.51479 iter/s,132.031s/200 iters),loss=1.58242
Train net output #0:accuracy_hat=1
Train net output #1:accuracy_hat_arc=0.4375
Train net output #2:loss_hat=1.58242(*1 = 1.58242 loss)

accuracy_hat_arc的精确度始终不能高于0.6，
loss_hat损失始终处于1.5多，不收敛

请问有谁遇到这个问题，或有解决思路？江湖救急，谢谢了

训练数据，

您好，作者，请问您的lmdb 数据，是用剪切对齐后的图片和label 制作的吗，label 是什么内容？？？，是每个人一个文件夹，有多张图片，最后分成多少类，就是多少个人的图片文件夹吗，数据如何制作？？

有没有与arcface-caffe/mxnet_to_caffe/face.prototxt 模型匹配的训练好的model文件？？

您好，我在https://github.com/gehaocool/CombinedMargin-caffe这个项目中找到了训练好的model，但是这个的输入是112X96的图像，而您的模型输入是112X112的？？是不是不能直接应用啊

doesnot work in the two class classification model training

hello guys,
Thanks the author for his excellent work firstly.
I use this arcface loss to finetune a classification model with two class,but I donot kown why it does not work.The loss does not decrease and accuracy is jumping.the parameter is m = 0.5, s=64.I have try some other parameters,but it is always same.
Has anyone encountered this similar problem? thanks.

关于adacos中的m

文章中应该是去掉了m的吧，为什么这里又要加上呢

关于cosin_add_m_layer的实现

感谢老师分享
关于cosin_add_m_layer的实现我有个疑问，Forward_cpu()中的以下代码是什么意思？arcface论文里好像只提到了下面else的实现，判断cos_t[i * dim + gt] <= threshold的意图以及对应的处理希望老师给解答一下，谢谢！

if(cos_t[i * dim + gt] <= threshold)

{

    top_data[i * dim + gt] = cos_t[i * dim + gt] - sin(M_PI - m_) * m_;

    tpflag[i * dim + gt] = 1.0f;

}

else

    top_data[i * dim + gt] = cos_t[i * dim + gt] * cos_m - sin_theta * sin_m;

arcface 损失函数的添加

作者您好，
请问您的损失函数的添加，我目前只加入 cosin_add_m_layer相关proto,参数，训练的时候出现，这种情况，输出 costheta >1 ************ 1.58 ，这种输出很多，请问可能什么原因昵？
然后，caffe版本的训练和mxnet类似吗，就是也是先只训练softmax,到12万步，然后加入arcface 损失曾，再进行finetune 吗？

mnist的例子

您好，请问您有用caffe版本的arcface测试过mnist的例子吗？我的训练刚开始迭代两次就loss=87.3365了，训练失败。我完全按照您给的caffe工程添加的loss层，可是一直不能训练，您能给个mnsit的例子吗？谢谢！

关于loss的梯度计算

作者你好，有一个关于loss的梯度计算的问题想请假一下：
1、arcface的梯度我看到代码是：cos_m + sin_m * cos_t[i * dim + gt] / sin_theta，其实就是sin(theta+m)/sin(theta)，我自己的计算是：-sin(theta+m)，是不是少了什么呢？
2、combined margin的梯度我看你的代码是：m1 * pow(1 - pow(bottom_data[i * dim + gt], 2), -0.5) * sin(m1_x_m2[i * dim + gt])，其实就是m1 * sin(m1 * theta+m2) * sin(theta)，我自己的计算是：-m1 * sin(m1 * theta+m2)，请问你的计算是怎么得到的呢？万分感谢！

CosinAddmBackward

Hi, I compared the cpp and cu code, I found a bug for calculated the diff in CosinAddmBackward function, it need multiply bottom_diff[index * dim + gt] when calculated the bottom_diff, it should be used the following code.
bottom_diff[index * dim + gt] =bottom_diff[index * dim + gt] *(cos(bais) + sin(bais) * cos_theta / sin_theta);

arcface-caffe/cosin_add_m_layer.cu

Line 37 in 1b0aa15

bottom_diff[index * dim + gt] = cos(bais) + sin(bais) * cos_theta / sin_theta;

UMDFaces Dataset数据集下载

UMDFaces Dataset数据集没提供下载了，你有没网盘的数据让我们下载训一下landmark-pose模型

About the cos_t in cosin_add_m_layer.cpp,and cos_theta > 1

Hi,
The cos_t is WX from bottom_data,right?
WX = ||W||*||X||cos(theta),
that means,cos_t = ||W||||X||*cos(theta), cos_theta >1 frequently,
Does it work despite of clip for cos_t?

Thank you!

How to get the similarity between two faces?

Hello, is there a demo that use caffemodel to get the distance and similarity between two faces just like the deploy/test.py of original insightface?Thanks,waiting for reply.

Focal loss

I noticed that you use Focal loss as a second loss. What's the purpose?

Did anyone try AdaCos?

I will in a couple of days and compare it with ArcFace using Megaface and other tests, will present the results. But I'm a bit confused about M parameter.

为什么loss为87.33，accuracy为1？

为什么把loss设置成arcloss之后，loss为87.33，accuracy为1？
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "concat_fc"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}

layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
}

layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "concat_loss"
}

如果直接是
layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "concat_fc"
bottom: "label"
top: "concat_loss"
}
就可以收敛，搞不清了为什么了

xialuxi / arcface-caffe Goto Github PK

arcface-caffe's Introduction

arcface-caffe

arcface-caffe's People

Contributors

Stargazers

Watchers

Forkers

arcface-caffe's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs