GithubHelp home page GithubHelp logo

Loss explosion with outdoor data. about omni3d HOT 5 OPEN

Hiusam avatar Hiusam commented on July 20, 2024
Loss explosion with outdoor data.

from omni3d.

Comments (5)

gkioxari avatar gkioxari commented on July 20, 2024

A few questions and comments:

  1. Did you run the config with the same batch size, learning rate and schedule that we suggest? Deviating from the recipe we suggest will certainly change the behavior of the losses during training (as is expected)

  2. Yes, occasionally we do encounter high losses during training. This is because an image might be out of distribution or have extreme annotations -- something that 3D suffers more from compared to 2D. For this reason, we provide checks and skip gradient updates for these cases. The model, given you use the recipe we provide, should have trained successfully though.

from omni3d.

Hiusam avatar Hiusam commented on July 20, 2024
  1. I ran Base_Omni3D_out.yaml provided in your repo without any modification. And it will restart training !! Restarting training at 51028 iters. Exploding loss 2% of iters !!. Maybe I should keep training and hope after some restarting, the training will be complete? :(
  2. I tried using gradient clip, but it doesn't help.
  3. Any plan to clear the dataset?

from omni3d.

gkioxari avatar gkioxari commented on July 20, 2024
  1. Large losses during training

And to confirm you ran with the same batch size. You should certainly keep training the model. We skip updates in the case of large losses to make training robust. The training should complete.

  1. Gradient clip

Gradient clip is another way to secure your model from large losses (and thus large gradients). We chose to skip the updates; gradient clipping clips the gradients. Our way of skipping updates when losses are large is certainly less aggressive than gradient clipping which is why we prefer it.

  1. Clear the dataset

@Hiusam this is not a dataset issue. There is nothing in the dataset to clear. 3D detection is simply much much harder than 2D detection. For instance, there are scenes with really far away objects (e.g. scenes with objects as far as 200m) in which case a wrong depth prediction in metric space will produce a large loss and thus large gradients. The solution is not to "clear" the dataset in any way, but to robustify training, which we do.

from omni3d.

chenfengxu714 avatar chenfengxu714 commented on July 20, 2024

Hi @gkioxari , nice work! I also encountered the same issue. Do you have an estimation of how many "retry" it usually needs? It seems that my experiments have been retried many times, e.g.., EST is 21 hours yet after two days it is still in retry. I also use the same Base_Omni3D_out config without any changes. Your suggestions would be very helpful!

from omni3d.

Lizhuoling avatar Lizhuoling commented on July 20, 2024

I encountered the same problem. Without modifying the code, the training loss explodes during training in both indoor and outdoor scenes. I have tried resuming the experiments from the saved checkpoint, but it does not work. The training loss explodes soon again.

from omni3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.