Thank you very much for making your code available and for you extremely insightful work on domain adaptation applied to CXRs.
I have read your publication with great interest and I have a question regarding the "Global Feature Enhancer Loss".
My understanding is that if the feature matrices are L2 normalized, the diagonal of the resulting gram matrices will always be the identity vector and therefore the GFE loss would always be 0. I must be missing something there but even when trying on some random feature tensors with following code:
def generate_gram_matrix(y):
(b, ch, h, w) = y.size()
features = y.view(b, ch, w * h)
features = nn.functional.normalize(features, dim=2, eps=1e-7)
features_t = features.transpose(1, 2)
gram_matrix = features.bmm(features_t)
return gram_matrix
def get_gfe_loss(x):
gfm = x.clone()
criterion_mse = torch.nn.MSELoss()
gm = generate_gram_matrix(gfm)
scores = torch.diagonal(gm, offset=0, dim1=-2, dim2=-1)
gt = torch.ones_like(scores)
gfe_loss = criterion_mse(scores, gt)
return gfe_loss
M = torch.randn((4, 12, 32, 32))
get_gfe_loss(torch.FloatTensor(M))
I always get loss of the order of 1e-13, therefore I have trouble understanding how this loss can contribute to the training.