GithubHelp home page GithubHelp logo

Comments (8)

koutilya-pnvr avatar koutilya-pnvr commented on May 25, 2024 2

@a-jahani Please correct if I am wrong. There are two ways to obtain GT depth for the Kitti test setup (eigen split or kitti split doesn't matter): 1) Calibration + Velodyne data (Lidar) or 2) The official ground truth depth images (un interpolated) provided by the Official Kitti providers.

People so far (including this work) used to follow 1) but the idea is to slowly shift towards 2) ??
If I am not wrong there is an interpolated version (completed depth) of the GT depth images by the offiicial Kitti providers, was it for qualitative comparisons only? The sparse (un interpolated) GT depth (obtained either from 1) or 2)) should alone be used for quantitative evaluation like how everyone in this field reported?

from sfmlearner-pytorch.

ClementPinard avatar ClementPinard commented on May 25, 2024

Main criteria is from sparse ground truth. The key idea is that interpolated data is not real data, thus comparing your prediction with that does not make much sense.

The interpolation can be used for qualitative result where you can subjectively decide whether your prediction looks like the interpolated ground truth.

The probleme with quantitative result with interpolated data resides in the plane boundaries. between a pixel belonging to a foreground plane (say the car) and the next one on the background, there is a discontinuity, but you don't know for sure where. Interpolate "blurs" the discontinuity and some actually good points from your prediction may appear wrong because the interpolated value is a mid point between fore and background while it should not be.

Hope I was clear enough ! For depth evulation with KITTI, you can look at the first paper using the now usual measurements : https://papers.nips.cc/paper/5539-depth-map-prediction-from-a-single-image-using-a-multi-scale-deep-network.pdf

from sfmlearner-pytorch.

sanweiliti avatar sanweiliti commented on May 25, 2024

Hi, @ClementPinard
Yes your answer is very clear! Thank you very much!
Another question is that some papers use the sparse depth ground truth provided by the KITTI depth benchmark, while this paper use the depth ground truth computed from other information and parameters. Will there be large differences in these two kinds of depth annotations? I can only see from the visualization that these two depth ground truths appear to be very similar.

from sfmlearner-pytorch.

ClementPinard avatar ClementPinard commented on May 25, 2024

Officially it is supposed to be the exact same. The depth benchmark is just a ready-to-go depth image instead of LiDar data + calibration.

Now, if you look at other datasets, you can see slight differences, especially with Odometry, where groundtruth pose has probably been smoothed compared to raw data + calibration.

I think it's safe to say that the evaluation is pretty much the same here, because the LiDar and fixed calibration are pretty reliable.

from sfmlearner-pytorch.

jahaniam avatar jahaniam commented on May 25, 2024

I do not suggest validating against interpolated points(Haven't seen anyone doing that) but you can use interpolated to train your network and my experiments shows there is a boost in it if your interpolation is good without weird artifact.

@ClementPinard I have evaluated the "Lidar data + calibration" and the post process KITTI depth data for Eigen split. In reality, as you said they should be exactly same but there are quite different noises and artifacts that affects the LiDAR measurements and LiDAR and post process depth are not same and LiDAR is not reliable measurement. Right now the research of depth estimation for KITTI is at the point where using LiDAR for evaluation should be revisited.

image


I have also shown if you use ground truth from new Kitti benchmark for training you get a huge performance boost (comparing row one and three)

image

more info was discussed here:

#mrharicot/monodepth#166 (comment)

from sfmlearner-pytorch.

jahaniam avatar jahaniam commented on May 25, 2024

@koutilya40192 Yes you are right in all of your sentences. evaluating on 1) is not good as the ground truth is wrong 2) is better but still it is not dense so your algorithm might predict very wrong result on those regions while getting good numbers.

There is no interpolated version (completed depth) of the GT depth images by the offiicial Kitti providers. Some researchers interpolate themselfs using different methods and some use the interpolated for visualization only and some use it for training and none (as far as I know) use it for quantitative evaluation. For quantitative evaluation it's either 1) or 2) and I suggest use 2) and submit your result to the benchmark.

from sfmlearner-pytorch.

ClementPinard avatar ClementPinard commented on May 25, 2024

The interpolated point is not good, especially because of depth discontinuities between fore and background The interpolated will be very wrong. The only way to have a dense ground truth is to interpolated the 3D point cloud to have a mesh and then to project the mesh to the camera frame.

However you then need a much denser 3D point, and from different points of view, because here we only have the POV of the car.

It's going to take some work to have a truly dense depth ground truth enabled dataset to validate these algorithms with.

As for applying the 2), I'll see what we can do to provide a script that does exactly that, be it only indicating from the README where to get the data or a brand new testing script.

from sfmlearner-pytorch.

koutilya-pnvr avatar koutilya-pnvr commented on May 25, 2024

Thanks for your responses @a-jahani and @ClementPinard . That really clears a lot of my doubts.

from sfmlearner-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.