Hi, first thanks for the nice work and for releasing it open source.
I have a question which is related to pytorch lightning but also to your code, so I wanted to ask here if you could point me to some part of the code or other resources.
I am adding one (or more) loss functions to change the way the model learns the assembly task. I believe there is margin for improvements in the loss function calculation.
My question is: how can I add a custom loss and include it in the computation graph?
I added the loss function in base_model.py
, added in the loss_dict
, added its weight in the configuration file (let's call our loss custom_loss
, I added _C.loss = CN(); _C.loss.custom_loss_w = 1.
in the config file).
Now when I run the training (no errors, it goes through), I see my loss always having almost the same value without changes. One explanation could be that I need to train longer, but I do suspect that the optimization is not including my new loss.
So I debugged the code a bit and printed out the losses, and got:
loss: trans_loss, value: tensor([[0.1740, 0.0799]], device='cuda:0', grad_fn=<StackBackward0>)
loss: rot_pt_cd_loss, value: tensor([[0.0029, 0.0040]], device='cuda:0', grad_fn=<StackBackward0>)
loss: transform_pt_cd_loss, value: tensor([[0.0024, 0.0012]], device='cuda:0', grad_fn=<StackBackward0>)
loss: custom_loss, value: tensor([[0.0262, 0.0298]], device='cuda:0')
loss: rot_loss, value: tensor([[0.4134, 0.3462]], device='cuda:0', grad_fn=<StackBackward0>)
loss: rot_pt_l2_loss, value: tensor([[0.0294, 0.1065]], device='cuda:0', grad_fn=<StackBackward0>)
As you can see, the custom_loss
is missing the grad_fn
, and as I can read here, this could mean that the function is not connected to the computation graph and therefore the optimizer does not take it into account. That would explain why it never decreases.
So I switched my custom loss (which was an exponential) with a pre-defined from pytorch (trying torch.nn.L1Loss for example) to see if the problem was in the definition of the loss (I read yesterday A Gentle Introduction to torch.autograd, but it did not solve my doubts, should I add _requires_grad
somewhere? But I also did not see it in your code for other loss functions) but the result is the same (still no grad_fn
), so I was wondering, where in the code should I look at to include the loss in the computation graph?
Thanks a lot for your time in advance.
Context for why I am trying to add a loss:
I got the training to work and I was looking at results, and to me it seems that the fragments stays in the same place where they have been placed at the beginning (in the origin). Even after a consistent number of epochs, both pieces are still there. I want to add a loss which forces them to be moved apart. I am starting with a simple idea about the minimum distances of two points: assuming we got the correct transformation for the assembly of two pieces, the minimum distance between any point in piece A and in piece B would be zero (any point on the border of the two pieces). Of course the fact that the minimum distance is zero does not necessarily mean that the assembly is correct, but it's a place to start. I would be happy already by seeing the two pieces being moved apart. More will follow later (maybe using center of mass or more sophisticated methods), but my guess is that the estimation of the transformation needs to be guided with a meaningful loss on the assembly part, meaning the two pieces are completing and not overlapping each other.