There are several lines code in function SoftTripletLoss
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
oss = (- triple_dist_ref * triple_dist).mean(0).sum()
return loss
I think it should be:
triple_dist = torch.stack((dist_ap, dist_an), dim=1)
triple_dist = F.log_softmax(triple_dist, dim=1)
mat_dist_ref = euclidean_dist(emb2, emb2)
dist_ap_ref = torch.gather(mat_dist_ref, 1, ap_idx.view(N,1).expand(N,N))[:,0]
dist_an_ref = torch.gather(mat_dist_ref, 1, an_idx.view(N,1).expand(N,N))[:,0]
triple_dist_ref = torch.stack((dist_ap_ref, dist_an_ref), dim=1)
triple_dist_ref = F.softmax(triple_dist_ref, dim=1).detach()
# loss = (- triple_dist_ref * triple_dist).mean(0).sum()
loss = (- triple_dist_ref[:,0] * triple_dist[:,0]).mean()
return loss
your code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} - log{exp([F(x_i)F(x_i,n))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]} , which is not consistent with the loss in your paper.
my modified code is : -log{exp(F(x_i)F(x_i,p))/[exp([F(x_i)F(x_i,p))+exp([F(x_i)F(x_i,n))]}, which is consistent with your paper. However, the performace of my modified code is worse than you original code.
I can't understand the question above.
I'm looking forward to your reply!