xialuxi / arcface-caffe Goto Github PK
View Code? Open in Web Editor NEWinsightface-caffe
License: MIT License
insightface-caffe
License: MIT License
文章中应该是去掉了m的吧,为什么这里又要加上呢
arcface-caffe/AdaCos/adacos_add_m_scale.prototxt这个里面有一个NormL2层,好像工程里面没有这一层,这个可以在哪里找到?谢谢,
您好,感谢您的工作。
最近把该损失函数用到其他分类问题,我在训练的时候测试集精度能达到99%,但是测试的时候用相同的测试集精度却只有94%。
我测试的时候不是用的caffe的tools里面的test,是自己读图的方式,测试时prototxt直接取到fc2,是否这里有问题,需要做修改吗?
@xialuxi ,谢谢您的回复,我上午看了adaface的论文的实现
用caffe 训练很慢的原因请问作者后来有发现吗,前向传播我觉得挺快的
我设置的参数是batch size=56 l两块1080 iter_size:6
/57597324-8428ad00-7581-11e9-9fa1-e2d72d0446f7.png)
12:49:24.228211 23509 solver.cpp:243] Iteration 0, loss = 24.1503
I0513 12:49:24.228235 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.839286
I0513 12:49:24.228257 23509 solver.cpp:259] Train net output #1: softmax_loss = 22.5511 (* 1 = 22.5511 loss)
I0513 12:49:24.228299 23509 sgd_solver.cpp:138] Iteration 0, lr = 0.01
I0513 12:54:50.852994 23509 solver.cpp:243] Iteration 100, loss = 20.3324
I0513 12:54:50.853057 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.928571
I0513 12:54:50.853081 23509 solver.cpp:259] Train net output #1: softmax_loss = 17.7896 (* 1 = 17.7896 loss)
I0513 12:54:50.923504 23509 sgd_solver.cpp:138] Iteration 100, lr = 0.01
I0513 13:00:39.458894 23509 solver.cpp:243] Iteration 200, loss = 18.8438
I0513 13:00:39.459019 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.964286
I0513 13:00:39.459044 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.4004 (* 1 = 16.4004 loss)
I0513 13:00:39.500185 23509 sgd_solver.cpp:138] Iteration 200, lr = 0.01
I0513 13:06:26.364652 23509 solver.cpp:243] Iteration 300, loss = 18.461
I0513 13:06:26.364759 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.946429
I0513 13:06:26.364783 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.8125 (* 1 = 16.8125 loss)
我训练网络的batchsize为128,学习率从0.001开始降到0.00001,迭代了20000代,train loss在3~4之间一直动荡,降不下去。初始学习率改了也是这样。想问问你是怎么设置训练参数的?
您好,作者,请问您的lmdb 数据,是用剪切对齐后的图片和label 制作的吗,label 是什么内容???,是每个人一个文件夹,有多张图片,最后分成多少类,就是多少个人的图片文件夹吗,数据如何制作??
I will in a couple of days and compare it with ArcFace using Megaface and other tests, will present the results. But I'm a bit confused about M parameter.
Hi,
It seems that you missed to upload the innerProduct layer with normalized feature?
https://github.com/xialuxi/arcface-caffe/blob/master/cosin_add_m.prototxt#L828
感谢老师分享
关于cosin_add_m_layer的实现我有个疑问,Forward_cpu()中的以下代码是什么意思?arcface论文里好像只提到了下面else的实现,判断cos_t[i * dim + gt] <= threshold的意图以及对应的处理希望老师给解答一下,谢谢!
if(cos_t[i * dim + gt] <= threshold)
{
top_data[i * dim + gt] = cos_t[i * dim + gt] - sin(M_PI - m_) * m_;
tpflag[i * dim + gt] = 1.0f;
}
else
top_data[i * dim + gt] = cos_t[i * dim + gt] * cos_m - sin_theta * sin_m;
您好,您有没有写cosin_add_m_layer.cu 文件呢?loss层的cpu计算和gpu计算对网络训练时间影响大吗?
你的实现中theta_med是计算batch内所有样本在所有类别上的角度均值,论文中说是“the median of all corresponding classes’ angles”,我理解的是类似每个样本在标签类上的夹角,不知道对不对?
UMDFaces Dataset数据集没提供下载了,你有没网盘的数据让我们下载训一下landmark-pose模型
Hi, I compiled your repository as your described, I did as follow
1.) I downloaded the repository ) https://github.com/xialuxi/AMSoftmax project
2.) ın the caffe windows directory I changed the make.config as described caffe installation
2.1) cd caffe-windows
2.2) for req in $(cat python/requirements.txt); do pip install --trusted-host pypi.python.org $req; done
2.3) cp Makefile.config.example Makefile.config
2.4)gedit Makefile.config
USE_CUDNN := 1
OPENCV_VERSION := 3
PYTHON_INCLUDE := /usr/include/python2.7
/usr/local/lib/python2.7/dist-packages/numpy/core/include
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
3 ) Copy cosin_add_m_layer.hpp to the directory: ./caffe/include/caffe/layers/
4) Copy cosin_add_m_layer.cpp and cosin_add_m_layer.cu to the directory: ./caffe/src/caffe/layers/
5) According to the proto file, modify the ./caffe/src/caffe/proto/caffe.proto file accordingly.
6) Also I copy combined_margin_layer.cpp,
combined_margin_layer.cu and combined_margin_layer.hpp in the https://github.com/gehaocool/CombinedMargin-caffe as descrbed in the step 3 and 4 places
6) make -j8
7.) make py
8.) make test -j8
But after these step when I run the following command
make runtest -j8
This failed for some layer testing. Therefore I did not run your repository. I miss something can you correct me in the compilation steps.
Thank you for your time..
作者您好,
请问您的损失函数的添加,我目前只加入 cosin_add_m_layer相关proto,参数,训练的时候出现,这种情况,输出 costheta >1 ************ 1.58 ,这种输出很多,请问可能什么原因昵?
然后,caffe版本的训练和mxnet类似吗,就是也是先只训练softmax,到12万步,然后加入arcface 损失曾,再进行finetune 吗?
Hi, I compared the cpp and cu code, I found a bug for calculated the diff in CosinAddmBackward function, it need multiply bottom_diff[index * dim + gt] when calculated the bottom_diff, it should be used the following code.
bottom_diff[index * dim + gt] =bottom_diff[index * dim + gt] *(cos(bais) + sin(bais) * cos_theta / sin_theta);
arcface-caffe/cosin_add_m_layer.cu
Line 37 in 1b0aa15
landmark_and_pose/new_model/detect.py这个脚本里,pose = pose * 90.0是什么意思? 是代表这个模型出来的人脸角度都需要旋转90度吗?
你好,我尝试了两个模型的关键点检测,出来的效果都不准。后来尝试加了人脸框检测,并未有所改善,请问是什么地方我忽略了吗?
I noticed that you use Focal loss as a second loss. What's the purpose?
作者您好,请问训练出的模型的效果如何?和mxnet的代码相比,模型精度有损失吗?
效果怎么样?
Hi,
The cos_t is WX from bottom_data,right?
WX = ||W||*||X||cos(theta),
that means,cos_t = ||W||||X||*cos(theta), cos_theta >1 frequently,
Does it work despite of clip for cos_t?
Thank you!
您好,请问您有用caffe版本的arcface测试过mnist的例子吗?我的训练刚开始迭代两次就loss=87.3365了,训练失败。我完全按照您给的caffe工程添加的loss层,可是一直不能训练,您能给个mnsit的例子吗?谢谢!
模型deploy如下:
name: "ArcFace"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
interp_mode: AREA
interp_mode: CUBIC
interp_mode: LANCZOS4
}
mirror: True
crop_h: 128
crop_w: 128
#distort_param {
# brightness_prob: 0.5
# brightness_delta: 32
# contrast_prob: 0.5
# contrast_lower: 0.5
# contrast_upper: 1.5
# hue_prob: 0.5
# hue_delta: 18
# saturation_prob: 0.5
# saturation_lower: 0.5
# saturation_upper: 1.5
# random_order_prob: 0.
#}
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_train_9204cls_lmdb"
batch_size: 512
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
}
crop_h: 128
crop_w: 128
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_test_9204cls_lmdb"
batch_size: 2
backend: LMDB
}
}
############## CNN Architecture ###############
layer {
name: "data/bias"
type: "Bias"
bottom: "data"
top: "data/bias"
param {
lr_mult: 0
decay_mult: 0
}
bias_param {
filler {
type: "constant"
value: -128
}
}
}
################################################
layer {
name: "conv1"
type: "Convolution"
bottom: "data/bias"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 7
pad: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv1_bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1_scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv1_relu"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "pool1_1"
type: "Pooling"
bottom: "pool1"
top: "pool1_1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1_1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_1_bn"
type: "BatchNorm"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_1_scale"
type: "Scale"
bottom: "conv2_1"
top: "conv2_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_1_relu"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_2_bn"
type: "BatchNorm"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "conv2_2_scale"
type: "Scale"
bottom: "conv2_2"
top: "conv2_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_2_relu"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
##############################################
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_1_bn"
type: "BatchNorm"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_1_scale"
type: "Scale"
bottom: "conv3_1"
top: "conv3_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_1_relu"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_2_bn"
type: "BatchNorm"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_2_scale"
type: "Scale"
bottom: "conv3_2"
top: "conv3_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_2_relu"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "conv3_2"
top: "conv4_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_1_bn"
type: "BatchNorm"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_1_scale"
type: "Scale"
bottom: "conv4_1"
top: "conv4_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_1_relu"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_2_bn"
type: "BatchNorm"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_2_scale"
type: "Scale"
bottom: "conv4_2"
top: "conv4_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_2_relu"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
################################################
layer {
name: "conv5_1"
type: "Convolution"
bottom: "conv4_2"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv5_1_bn"
type: "BatchNorm"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_1_scale"
type: "Scale"
bottom: "conv5_1"
top: "conv5_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv5_1_relu"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv5_1"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#########################################
#########################################
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool3"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc1_bn"
type: "BatchNorm"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc1_scale"
type: "Scale"
bottom: "fc1"
top: "fc1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "fc1_relu"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc2_norm"
type: "NormalizeJin"
bottom: "fc2"
top: "fc2_norm"
norm_jin_param {
across_spatial: true
scale_filler {
type: "constant"
value: 1.0
}
channel_shared: true
}
}
############### Arc-Softmax Loss ##############
layer {
name: "fc6_changed"
type: "InnerProduct"
bottom: "fc2_norm"
top: "fc6"
inner_product_param {
num_output: 9204
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
####################################################
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.1
}
include {
phase: TRAIN
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
include {
phase: TRAIN
}
}
######################################################
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
#bottom: "label"
#bottom: "data"
top: "softmax_loss"
loss_weight: 1
include {
phase: TRAIN
}
}
layer {
name: "Accuracy"
type: "Accuracy"
bottom: "fc6"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
loss损失如下:
I0627 17:38:58.567371 6757 solver.cpp:224] Iteration 450 (2.13816 iter/s, 4.67691s/10 iters), loss = 87.3365
I0627 17:38:58.567402 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:38:58.567409 6757 sgd_solver.cpp:137] Iteration 450, lr = 0.00314
I0627 17:39:03.256306 6757 solver.cpp:224] Iteration 460 (2.13288 iter/s, 4.6885s/10 iters), loss = 87.3365
I0627 17:39:03.256340 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:03.256347 6757 sgd_solver.cpp:137] Iteration 460, lr = 0.00314
I0627 17:39:07.941520 6757 solver.cpp:224] Iteration 470 (2.13457 iter/s, 4.68478s/10 iters), loss = 87.3365
I0627 17:39:07.941551 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:07.941558 6757 sgd_solver.cpp:137] Iteration 470, lr = 0.00314
I0627 17:39:12.623337 6757 solver.cpp:224] Iteration 480 (2.13612 iter/s, 4.68139s/10 iters), loss = 87.3365
I0627 17:39:12.623456 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
请问该如何修改?
hello guys,
Thanks the author for his excellent work firstly.
I use this arcface loss to finetune a classification model with two class,but I donot kown why it does not work.The loss does not decrease and accuracy is jumping.the parameter is m = 0.5, s=64.I have try some other parameters,but it is always same.
Has anyone encountered this similar problem? thanks.
以前的 层,是两个层,一个添加m角度,一个添加尺度64或者128
然后新的arcface合并只有一层, 没有scale 参数设置的值,但是这里和上面的实现的区别在哪里昵
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "temp_fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
修改过后的addm 层
layer {
name: "adacos_add_m_scale"
type: "AdaCosAddmScale"
bottom: "fc6"
bottom: "label"
top: "fc6_margin_scale"
adacos_add_m_scale_param {
m: 0.5
num_classes: 10575
}
}
您好,我刚刚下载了您的最新的Combined Margin Loss文件,但是在编译的时候报错了,您看是什么问题呢?
Severity Code Description Project File Line Suppression State
Error C2065 'arccos_x': undeclared identifier libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32
Severity Code Description Project File Line Suppression State
Error C2228 left of '.mutable_cpu_data' must have class/struct/union libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32
谢谢您!!
Hi, thanks for your project, can you share the UMDFaces Dataset with me.
关键点检测的网络用的是可分离卷积,权重大小应该是filter_sizexfliter_sizexoutput_channel,
但实际上权重大小却是为input_channelxfilter_sizexfliter_sizexoutput_channel,类似普通的卷积。
我把这些可分离卷积改成了正常卷积,测试程序可运行,但结果不对,麻烦确认下。
运行landmark_and_pose路径下的testlandmark.py文件,程序可以运行,但是出来的点位置不对。
您好,我使用arcface训练四分类任务,最后的训练结果:
Iteration 285400(1.51479 iter/s,132.031s/200 iters),loss=1.58242
Train net output #0:accuracy_hat=1
Train net output #1:accuracy_hat_arc=0.4375
Train net output #2:loss_hat=1.58242(*1 = 1.58242 loss)
accuracy_hat_arc的精确度始终不能高于0.6,
loss_hat损失始终处于1.5多,不收敛
请问有谁遇到这个问题,或有解决思路?江湖救急,谢谢了
作者你好,我看insightface官网里面的人脸识别arcface网络结构,卷积层中都不使用偏置,而你这边全部都使用,请问这个有什么区别吗?谢谢
作者你好,有一个关于loss的梯度计算的问题想请假一下:
1、arcface的梯度我看到代码是:cos_m + sin_m * cos_t[i * dim + gt] / sin_theta,其实就是sin(theta+m)/sin(theta),我自己的计算是:-sin(theta+m),是不是少了什么呢?
2、combined margin的梯度我看你的代码是:m1 * pow(1 - pow(bottom_data[i * dim + gt], 2), -0.5) * sin(m1_x_m2[i * dim + gt]),其实就是m1 * sin(m1 * theta+m2) * sin(theta),我自己的计算是:-m1 * sin(m1 * theta+m2),请问你的计算是怎么得到的呢?万分感谢!
在哪里可以下载到测试的model
Hello, is there a demo that use caffemodel to get the distance and similarity between two faces just like the deploy/test.py of original insightface?Thanks,waiting for reply.
hi xialuxi:
看起来CosinAddmLayer就是arcloss(对应combined margin m1=1,m3=0的情况),那为什么CosinAddmLayer考虑cos_t[i * dim + gt] > 1.0f和cos_t[i * dim + gt] <= threshold,而combined_margin_layer不需要呢?
请问最终训练采用的哪一个呢?
非常感谢!
看起来没发现loss的计算部分,损失函数层并没有出现在这些文件里?
Hi,
what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?
I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model.
I'm using Adam and the results are following (dataset is CARS196 )
In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces.
Or we need more adaptive LR method for this problem.
为什么把loss设置成arcloss之后,loss为87.33,accuracy为1?
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "concat_fc"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
}
layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "concat_loss"
}
如果直接是
layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "concat_fc"
bottom: "label"
top: "concat_loss"
}
就可以收敛, 搞不清了为什么了
Could you share wingloss in train prototxt example like EuclideanLoss
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "label"
top: "loss"
loss_weight: 100
}
非常感谢您的复现,请问SV-X-Softmax有.cu文件文件吗,想试试复现效果
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.