Comments (8)
A NaN reconstruction Loss causes all the gradients to be NaN. Even if only one value is NaN over the whole diff map, you will get NaN at the next optimizer step, so you really want to avoid that !
This line is here to help you figure out what goes wrong if you get to a NaN training loss, since as soon as this gets to NaN, your network is basically bound to output NaN until the end.
Now on how it got to be NaN depends on your problem, I advise you to identify a seed on which it appears each time and try to find where a first NaN appears.
My guess is maybe when computing u,v coordinates here https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/inverse_warp.py#L65 since we divide by a Z value, and when it's 0 you get NaN.
from sfmlearner-pytorch.
I do understand that once loss goes to NaN, the whole training becomes pointless. Gradients go to NaN and there's no way of coming back.
However, I was intrigued by the fact that checking for loss to be NaN is done only for the photometric reconstruction loss. Other loss functions are not checked for NaN. So, I was wondering whether you encountered any special scenario in which photometric reconstruction loss became NaN?
The Z value in the "inverse_warp.py" is being clamped to 1e-3. Hence I don't see any chance under which a divide by zero will occur. The kinect depth that I'm using has lot of zeros. If division with zero was the issue then it should have happened at every iteration. Could this be due to some overflow error?
from sfmlearner-pytorch.
other loss function are actually much simpler since their target value is fixed (smooth loss and explainability loss) but you can check for them too.
you can also try discard 0 values in your Kinect because it can cause high distance to warp for a particular translation.
The other potnetial source of NaN can be Adam optimizer which has a second order term which can diverge if your learning rate is too, you should check for weights values after optimizer step too.
from sfmlearner-pytorch.
The 0 values in Kinect are either due to objects are very far away or they don't reflect the IR light that is projected. Currently I'm not taking care for that in the warping part. However I'm setting the photometric loss to zero at those pixels where the depth is zero.
Previously I have faced issue with Adam optimizer. However, it may not be the case here because only the photometric loss is going to NaN. Other losses are all within reasonable range.
I will have a more careful look at the code and try to log things properly whenever I encounter the loss to be NaN.
from sfmlearner-pytorch.
I tried to reimplement this and also got nans in photometric reconstruction loss. It only happened in the monocular case, not the stereo one. It is very annoying that training would suddenly die. I haven't tried to clamp the depth, hopefully that will fix it.
from sfmlearner-pytorch.
Well, the depth computation is monocular here. What do you mean by the stereo case? Are the NaNs because of zero depth?
from sfmlearner-pytorch.
It's definitely not because of zero depth. Depth is being clamped to 0.001 before being divided. One reason could be some overflow/underflow of the numbers.
In the LSD-SLAM paper authors modify inverse depth such that the mean inverse depth is 1 in every iteration. May be that could solve the problem.
It's definitely very annoying to see that the training stops abruptly and we can't figure out what's wrong.
from sfmlearner-pytorch.
I was talking about my own implementation, apologize for the confusion.
Originally I did not clamp depth to nonzero and I got nans, but after clamping it never happen again.
from sfmlearner-pytorch.
Related Issues (20)
- What happens if I use 3 or more frames? HOT 1
- train with my own video HOT 1
- what's the minimal files required to train depth only model HOT 1
- Query regarding depth map. HOT 2
- Large Errors on Pose Prediction Network HOT 3
- why the gpu memory cost of tensorflow version is larger than pytorch version HOT 2
- Weird results from pretrained model on KITTI images HOT 4
- Question about using oxts data HOT 1
- Cannot run `train.py` with nohup HOT 2
- imread during inference load the image as uint8 HOT 4
- How About the Flops, fps and parameter of this model? HOT 1
- regarding the predicted depth map during inverse warp HOT 2
- How to visualize the warped image (ref_img_wapred) HOT 2
- regarding inverse_warping HOT 10
- Is the image input of depth network fixed? HOT 2
- Question about diff
- How to load training dataset
- Regarding the depth used for generating target image HOT 5
- Question about the poses predicted by the posenet HOT 2
- about the pose scale HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sfmlearner-pytorch.