Hi there, I hope this message finds you well. I have been working on

Joint training in grasp generation about unidexgrasp HOT 4 CLOSED

pku-epic commented on August 16, 2024

Joint training in grasp generation

from unidexgrasp.

Comments (4)

lhrrhl0419 commented on August 16, 2024 2

Thank you so much for your clarification. I trained these three network again after received your reply. The loss curve for GLOW model has no tendency to converge, just same as what I got a few days ago.

I apologize for the delay in addressing this issue, which was caused by disk-related problems. The glow's training loss couldn't converge because of a bug in the training code and it has been fixed. I've retrained the glow with the new code and the loss converges now.

I noticed that the cmap_loss is remain 0 during training. I am wondering if the independent training of GLOW doesn't take the contact map as supervision? According to the paper, the translation and pose of hand should be fed into the ContactNet to generate a contact map, but in independent training, the output of ContactNet would be messy so we can't use them to supervise GLOW and IPDF, is that right?

The cmap loss is only calculated in the joint training now, and I think that use this loss in the independent training would not severely harm the training process due to the supervision of the nll loss and would not have a significant effect as this additional loss is optimized in the joint training, but we haven't tried it.

So, if we don't use cmap to supervise the GLOW network, apart from NLL there will be no other supervision to train the normalizing flow. I'm wondering how to ensure convergence of the network if the distribution of the sample space is so sparse (only the GT data has a probability of 1, all other data is 0). When I test the GLOW model after about 200 epoch, the output is far from GT.

Theoretically, the glow can just remember all inputs and outputs and predict Dirac distribution as you've mentioned, but just like other networks, if the dataset is big enough, it is able to generalize and figure out the underlying distribution of the dataset, and the distribution of our dataset, which is generated by DexGraspNet, is not sparse. For example, there are a lot of bottles lying on the table in the data, and some of the grasping poses in the dataset grasp the upper part of it while some grasp the middle or the lower part. By minimizing the nll loss, the glow minimizes the KL divergence between the data distribution and the glow's output distribution, so that it will assign high probability on all of those grasping poses. Additionally, as glow learns to model the distribution of all poses that can grasp the object in the dataset, the large distance between output and the GT is acceptable.

If you have any further questions or encounter any other issues, please feel free to reach out.

from unidexgrasp.

XYZ-99 commented on August 16, 2024

Thank you for your interest in our work!

Yes, your understanding is correct. They are firstly trained independently before being finetuned. The path in the config is more like a placeholder.

We don't have a strict number, but you should be fine once the training curves indicate the models are converging.

Sorry for the delay. Please let us know if you have further questions.

from unidexgrasp.

lym29 commented on August 16, 2024

Thank you for your interest in our work!

Yes, your understanding is correct. They are firstly trained independently before being finetuned. The path in the config is more like a placeholder.

We don't have a strict number, but you should be fine once the training curves indicate the models are converging.

Sorry for the delay. Please let us know if you have further questions.

Thank you so much for your clarification. I trained these three network again after received your reply. The loss curve for GLOW model has no tendency to converge, just same as what I got a few days ago.

I noticed that the cmap_loss is remain 0 during training. I am wondering if the independent training of GLOW doesn't take the contact map as supervision? According to the paper, the translation and pose of hand should be fed into the ContactNet to generate a contact map, but in independent training, the output of ContactNet would be messy so we can't use them to supervise GLOW and IPDF, is that right?

So, if we don't use cmap to supervise the GLOW network, apart from NLL there will be no other supervision to train the normalizing flow. I'm wondering how to ensure convergence of the network if the distribution of the sample space is so sparse (only the GT data has a probability of 1, all other data is 0). When I test the GLOW model after about 200 epoch, the output is far from GT.

I apologize if there are any misunderstandings on my part since I am unfamiliar with normalizing flow. Please kindly point them out. Thanks!

from unidexgrasp.

lym29 commented on August 16, 2024

Thanks for the explanation! I understand your method now and have learned a lot from your work :)

from unidexgrasp.

Joint training in grasp generation about unidexgrasp HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs