filippoaleotti / mobilepydnet Goto Github PK

View Code? Open in Web Editor NEW

248.0 20.0 40.0 70.65 MB

Pydnet on mobile devices

License: Apache License 2.0

Swift 51.82% Metal 1.09% Python 46.41% Shell 0.69%

deep-learning monocular-depth-estimation ios android computer-vision

mobilepydnet's People

Stargazers

Watchers

mobilepydnet's Issues

The original model before conversion

Hi,
Thanks for the great work! For the newly released v2 model for iOS, the model is in mlmodel format. I'm wondering if you could share the original Tensorflow saved model (keras hdf5 maybe) or checkpoint so I could run it on the computer?
Thank you!

Questions about training pipeline

First of all, thanks for sharing the great project! I have tried to implement your mobilePydnet network but cannot reach totally to the same results compared with pre-trained model. For that reason I have several questions about the model, loss, data and training itself.

Did you initialize weights and biases by using some particular initialization strategy or did you just use the default initialization of convolution layers?
Did you use any data augmentation like flipping, rotating, random cropping or blurring?
You told here in the issues section that your range of input and output images are [0,255]. Does it mean that in the training when you load input image and ground truth as float32, you don't normalize them for example by dividing 255 to range [0,1]?
The loss is described in the paper as: $L(D_{x}^{s}, D_{gt}) = \alpha_l \left \| (D_{x}^{s} - D_{gt})\right \| + \alpha_s L_g(D_{x}^{s}, D_{gt})$ , where $\alpha_l$ is fixed to 1 and $\alpha_s$ goes from 0.5, 0.25, 0.125 (if I understood correctly you just used 3 different scales). Here is the python code for calculating the loss but I'm not sure if I am missing something:

Depth image for training

Hello Filippo Aleotti, As per #20 you have mentioned the mobilepydnet was trained using depth map. But the pydnet repo suggests using monodepth training script which require stereo image pair. I would like to train pydnet for custom RGB-D images instead of stereo image pair. Can you guide me how to use the depth map as reference signal for training as I am new to using pydnet.

Thank in advance

Input image dimensions

Hi, thanks for sharing this great project.
How is it possible to train the model on images with height and width multiples of 8 rather than 64px?

How did you pick the images from Microsoft COCO and OpenImages?

I have a follow-up question about issue #11:
In your paper "Real-time single image depth perception in the wild with handheld devices", you mentioned about picking 447k images from the two datasets. Did you systematically select a fixed amount of images across different categories? What's the strategy?

Are you planning to release the WILD dataset as well as the training code?

Great work, by the way.

What data augmentation do you use?

Hi, do you use data augmentation like random resizing?

Tflite support

According to your last changes in the commit, does your app support tflite?
In README it is still written that the app does not support tflite.

Unable to get same quality depthmap as in provided Android screenshots

Hello!
I'm testing android variant, however camera and single loaded bitmap don't produce same depthmap color difference. I've tried both networks(mobilenet and tf lite) with different scale parameters.
So is it possible to get same results(perfomance doesn't matter)?

Speedup on Android

I'm running the app on Samsung S8.
I get about 0.5 fps for the highest quality and around 2 fps for the lowest.
I'm trying to speed it up but have little experience with this.
From what I found in the documentation GPU will give about 2x but will only work with tflite.
Does anyone intend to contribute in this direction or can help with some guidance?

Crash Unexpected failure when preparing tensor allocations - Android 10 (Mi 9T)

Crashing at startup:

2020-08-23 14:01:54.068 3006-3124/unibo.cvlab.pydnet_tflite E/AndroidRuntime: FATAL EXCEPTION: inference
    Process: unibo.cvlab.pydnet_tflite, PID: 3006
    java.lang.IllegalStateException: Internal error: Unexpected failure when preparing tensor allocations: tensorflow/lite/kernels/conv.cc:313 input->dims->size != 4 (1 != 4)
    Node number 0 (CONV_2D) failed to prepare.
    
        at org.tensorflow.lite.NativeInterpreterWrapper.allocateTensors(Native Method)
        at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:149)
        at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:343)
        at org.tensorflow.lite.Interpreter.run(Interpreter.java:304)
        at unibo.cvlab.pydnet.Model.doInference(Model.java:89)
        at unibo.cvlab.pydnet.StreamActivity.doInference(StreamActivity.java:131)
        at unibo.cvlab.pydnet.StreamActivity.access$200(StreamActivity.java:26)
        at unibo.cvlab.pydnet.StreamActivity$2.run(StreamActivity.java:122)
        at android.os.Handler.handleCallback(Handler.java:883)
        at android.os.Handler.dispatchMessage(Handler.java:100)
        at android.os.Looper.loop(Looper.java:224)
        at android.os.HandlerThread.run(HandlerThread.java:67)

How to collect training dataset

Hi @FilippoAleotti
I want to train this model myself.
I looked into Pydnet and monodepth and both used stereo images as training data
Did you create the Depthmap in Midas and use it to create a stereo image?

Getting Parse Error: There was a problem parsing the package

Didn't modify the code at all. I trying to run this on a Vivo Nex S phone

Incorrect output dimension on ios

I tried reconverting the latest pretrained model from the Pydnet repository to ios via tfcoreml. The conversion is successful but the output shape has wrong dimensions:
(1, 512, 256, 2)
I expected it to be 1 instead of 2. I know there is already a provided ios coreml file here but i plan to retrain the Pydnet model on my own dataset later. That's why i am attempting to do the conversation myself.

mlmodel = tfcoreml.convert(
            tf_model_path = './checkpoint/IROS18/frozen_model.pb',
            mlmodel_path = './checkpoint/IROS18/pydnet.mlmodel',
            output_feature_names = ['model/L0/ResizeBilinear'],
            image_input_names = ['input'],
            input_name_shape_dict = {'input':[1,512,256,3]},
            minimum_ios_deployment_target = '13'
 )

@GZaccaroni did you encounter such issue when you did the conversion for the ios part? Due to the incorrect dimension, i am not able to transform it to a valid image.
Here is the full conversion log from tfcoreml:

[SSAConverter] Converting function main ...
[SSAConverter] [1/143] Converting op type: 'Placeholder', name: 'input', output_shape: (1, 512, 256, 3).
[SSAConverter] [2/143] Converting op type: 'Const', name: 'model/pyramid/conv1a/mul/x'.
[SSAConverter] [3/143] Converting op type: 'Const', name: 'model/pyramid/conv1b/mul/x'.
[SSAConverter] [4/143] Converting op type: 'Const', name: 'model/pyramid/conv2a/mul/x'.
[SSAConverter] [5/143] Converting op type: 'Const', name: 'model/pyramid/conv2b/mul/x'.
[SSAConverter] [6/143] Converting op type: 'Const', name: 'model/pyramid/conv3a/mul/x'.
[SSAConverter] [7/143] Converting op type: 'Const', name: 'model/pyramid/conv3b/mul/x'.
[SSAConverter] [8/143] Converting op type: 'Const', name: 'model/pyramid/conv4a/mul/x'.
[SSAConverter] [9/143] Converting op type: 'Const', name: 'model/pyramid/conv4b/mul/x'.
[SSAConverter] [10/143] Converting op type: 'Const', name: 'model/pyramid/conv5a/mul/x'.
[SSAConverter] [11/143] Converting op type: 'Const', name: 'model/pyramid/conv5b/mul/x'.
[SSAConverter] [12/143] Converting op type: 'Const', name: 'model/pyramid/conv6a/mul/x'.
[SSAConverter] [13/143] Converting op type: 'Const', name: 'model/pyramid/conv6b/mul/x'.
[SSAConverter] [14/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-3/mul/x'.
[SSAConverter] [15/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-4/mul/x'.
[SSAConverter] [16/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-5/mul/x'.
[SSAConverter] [17/143] Converting op type: 'Const', name: 'model/L6/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [18/143] Converting op type: 'Const', name: 'model/L6/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [19/143] Converting op type: 'Const', name: 'model/L6/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [20/143] Converting op type: 'Const', name: 'model/L6/upsampler/mul/x'.
[SSAConverter] [21/143] Converting op type: 'Const', name: 'model/L5/estimator/concat/axis'.
[SSAConverter] [22/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-3/mul/x'.
[SSAConverter] [23/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-4/mul/x'.
[SSAConverter] [24/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-5/mul/x'.
[SSAConverter] [25/143] Converting op type: 'Const', name: 'model/L5/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [26/143] Converting op type: 'Const', name: 'model/L5/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [27/143] Converting op type: 'Const', name: 'model/L5/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [28/143] Converting op type: 'Const', name: 'model/L5/upsampler/mul/x'.
[SSAConverter] [29/143] Converting op type: 'Const', name: 'model/L4/estimator/concat/axis'.
[SSAConverter] [30/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-3/mul/x'.
[SSAConverter] [31/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-4/mul/x'.
[SSAConverter] [32/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-5/mul/x'.
[SSAConverter] [33/143] Converting op type: 'Const', name: 'model/L4/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [34/143] Converting op type: 'Const', name: 'model/L4/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [35/143] Converting op type: 'Const', name: 'model/L4/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [36/143] Converting op type: 'Const', name: 'model/L4/upsampler/mul/x'.
[SSAConverter] [37/143] Converting op type: 'Const', name: 'model/L3/estimator/concat/axis'.
[SSAConverter] [38/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-3/mul/x'.
[SSAConverter] [39/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-4/mul/x'.
[SSAConverter] [40/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-5/mul/x'.
[SSAConverter] [41/143] Converting op type: 'Const', name: 'model/L3/estimator/Slice/begin', output_shape: (4,).
[SSAConverter] [42/143] Converting op type: 'Const', name: 'model/L3/estimator/Slice/size', output_shape: (4,).
[SSAConverter] [43/143] Converting op type: 'Const', name: 'model/L3/estimator/mul/x'.
[SSAConverter] [44/143] Converting op type: 'Const', name: 'model/L0/size', output_shape: (2,).
[SSAConverter] [45/143] Converting op type: 'Transpose', name: 'input_to_nchw', output_shape: (1, 3, 512, 256).
[SSAConverter] [46/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv1a/Conv2D', output_shape: (1, 16, 256, 128).
[SSAConverter] [47/143] Converting op type: 'Mul', name: 'model/pyramid/conv1a/mul', output_shape: (1, 16, 256, 128).
[SSAConverter] [48/143] Converting op type: 'Maximum', name: 'model/pyramid/conv1a/Maximum', output_shape: (1, 16, 256, 128).
[SSAConverter] [49/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv1b/Conv2D', output_shape: (1, 16, 256, 128).
[SSAConverter] [50/143] Converting op type: 'Mul', name: 'model/pyramid/conv1b/mul', output_shape: (1, 16, 256, 128).
[SSAConverter] [51/143] Converting op type: 'Maximum', name: 'model/pyramid/conv1b/Maximum', output_shape: (1, 16, 256, 128).
[SSAConverter] [52/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv2a/Conv2D', output_shape: (1, 32, 128, 64).
[SSAConverter] [53/143] Converting op type: 'Mul', name: 'model/pyramid/conv2a/mul', output_shape: (1, 32, 128, 64).
[SSAConverter] [54/143] Converting op type: 'Maximum', name: 'model/pyramid/conv2a/Maximum', output_shape: (1, 32, 128, 64).
[SSAConverter] [55/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv2b/Conv2D', output_shape: (1, 32, 128, 64).
[SSAConverter] [56/143] Converting op type: 'Mul', name: 'model/pyramid/conv2b/mul', output_shape: (1, 32, 128, 64).
[SSAConverter] [57/143] Converting op type: 'Maximum', name: 'model/pyramid/conv2b/Maximum', output_shape: (1, 32, 128, 64).
[SSAConverter] [58/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv3a/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [59/143] Converting op type: 'Mul', name: 'model/pyramid/conv3a/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [60/143] Converting op type: 'Maximum', name: 'model/pyramid/conv3a/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [61/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv3b/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [62/143] Converting op type: 'Mul', name: 'model/pyramid/conv3b/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [63/143] Converting op type: 'Maximum', name: 'model/pyramid/conv3b/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [64/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv4a/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [65/143] Converting op type: 'Mul', name: 'model/pyramid/conv4a/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [66/143] Converting op type: 'Maximum', name: 'model/pyramid/conv4a/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [67/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv4b/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [68/143] Converting op type: 'Mul', name: 'model/pyramid/conv4b/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [69/143] Converting op type: 'Maximum', name: 'model/pyramid/conv4b/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [70/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv5a/Conv2D', output_shape: (1, 128, 16, 8).
[SSAConverter] [71/143] Converting op type: 'Mul', name: 'model/pyramid/conv5a/mul', output_shape: (1, 128, 16, 8).
[SSAConverter] [72/143] Converting op type: 'Maximum', name: 'model/pyramid/conv5a/Maximum', output_shape: (1, 128, 16, 8).
[SSAConverter] [73/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv5b/Conv2D', output_shape: (1, 128, 16, 8).
[SSAConverter] [74/143] Converting op type: 'Mul', name: 'model/pyramid/conv5b/mul', output_shape: (1, 128, 16, 8).
[SSAConverter] [75/143] Converting op type: 'Maximum', name: 'model/pyramid/conv5b/Maximum', output_shape: (1, 128, 16, 8).
[SSAConverter] [76/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv6a/Conv2D', output_shape: (1, 192, 8, 4).
[SSAConverter] [77/143] Converting op type: 'Mul', name: 'model/pyramid/conv6a/mul', output_shape: (1, 192, 8, 4).
[SSAConverter] [78/143] Converting op type: 'Maximum', name: 'model/pyramid/conv6a/Maximum', output_shape: (1, 192, 8, 4).
[SSAConverter] [79/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv6b/Conv2D', output_shape: (1, 192, 8, 4).
[SSAConverter] [80/143] Converting op type: 'Mul', name: 'model/pyramid/conv6b/mul', output_shape: (1, 192, 8, 4).
[SSAConverter] [81/143] Converting op type: 'Maximum', name: 'model/pyramid/conv6b/Maximum', output_shape: (1, 192, 8, 4).
[SSAConverter] [82/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-3/Conv2D', output_shape: (1, 96, 8, 4).
[SSAConverter] [83/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-3/mul', output_shape: (1, 96, 8, 4).
[SSAConverter] [84/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-3/Maximum', output_shape: (1, 96, 8, 4).
[SSAConverter] [85/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-4/Conv2D', output_shape: (1, 64, 8, 4).
[SSAConverter] [86/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-4/mul', output_shape: (1, 64, 8, 4).
[SSAConverter] [87/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-4/Maximum', output_shape: (1, 64, 8, 4).
[SSAConverter] [88/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-5/Conv2D', output_shape: (1, 32, 8, 4).
[SSAConverter] [89/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-5/mul', output_shape: (1, 32, 8, 4).
[SSAConverter] [90/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-5/Maximum', output_shape: (1, 32, 8, 4).
[SSAConverter] [91/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-6/Conv2D', output_shape: (1, 8, 8, 4).
[SSAConverter] [92/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L6/upsampler/conv2d_transpose', output_shape: (1, 8, 16, 8).
[SSAConverter] [93/143] Converting op type: 'BiasAdd', name: 'model/L6/upsampler/BiasAdd', output_shape: (1, 8, 16, 8).
[SSAConverter] [94/143] Converting op type: 'Mul', name: 'model/L6/upsampler/mul', output_shape: (1, 8, 16, 8).
[SSAConverter] [95/143] Converting op type: 'Maximum', name: 'model/L6/upsampler/Maximum', output_shape: (1, 8, 16, 8).
[SSAConverter] [96/143] Converting op type: 'ConcatV2', name: 'model/L5/estimator/concat', output_shape: (1, 136, 16, 8).
[SSAConverter] [97/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-3/Conv2D', output_shape: (1, 96, 16, 8).
[SSAConverter] [98/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-3/mul', output_shape: (1, 96, 16, 8).
[SSAConverter] [99/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-3/Maximum', output_shape: (1, 96, 16, 8).
[SSAConverter] [100/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-4/Conv2D', output_shape: (1, 64, 16, 8).
[SSAConverter] [101/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-4/mul', output_shape: (1, 64, 16, 8).
[SSAConverter] [102/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-4/Maximum', output_shape: (1, 64, 16, 8).
[SSAConverter] [103/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-5/Conv2D', output_shape: (1, 32, 16, 8).
[SSAConverter] [104/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-5/mul', output_shape: (1, 32, 16, 8).
[SSAConverter] [105/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-5/Maximum', output_shape: (1, 32, 16, 8).
[SSAConverter] [106/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-6/Conv2D', output_shape: (1, 8, 16, 8).
[SSAConverter] [107/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L5/upsampler/conv2d_transpose', output_shape: (1, 8, 32, 16).
[SSAConverter] [108/143] Converting op type: 'BiasAdd', name: 'model/L5/upsampler/BiasAdd', output_shape: (1, 8, 32, 16).
[SSAConverter] [109/143] Converting op type: 'Mul', name: 'model/L5/upsampler/mul', output_shape: (1, 8, 32, 16).
[SSAConverter] [110/143] Converting op type: 'Maximum', name: 'model/L5/upsampler/Maximum', output_shape: (1, 8, 32, 16).
[SSAConverter] [111/143] Converting op type: 'ConcatV2', name: 'model/L4/estimator/concat', output_shape: (1, 104, 32, 16).
[SSAConverter] [112/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-3/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [113/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-3/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [114/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-3/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [115/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-4/Conv2D', output_shape: (1, 64, 32, 16).
[SSAConverter] [116/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-4/mul', output_shape: (1, 64, 32, 16).
[SSAConverter] [117/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-4/Maximum', output_shape: (1, 64, 32, 16).
[SSAConverter] [118/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-5/Conv2D', output_shape: (1, 32, 32, 16).
[SSAConverter] [119/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-5/mul', output_shape: (1, 32, 32, 16).
[SSAConverter] [120/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-5/Maximum', output_shape: (1, 32, 32, 16).
[SSAConverter] [121/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-6/Conv2D', output_shape: (1, 8, 32, 16).
[SSAConverter] [122/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L4/upsampler/conv2d_transpose', output_shape: (1, 8, 64, 32).
[SSAConverter] [123/143] Converting op type: 'BiasAdd', name: 'model/L4/upsampler/BiasAdd', output_shape: (1, 8, 64, 32).
[SSAConverter] [124/143] Converting op type: 'Mul', name: 'model/L4/upsampler/mul', output_shape: (1, 8, 64, 32).
[SSAConverter] [125/143] Converting op type: 'Maximum', name: 'model/L4/upsampler/Maximum', output_shape: (1, 8, 64, 32).
[SSAConverter] [126/143] Converting op type: 'ConcatV2', name: 'model/L3/estimator/concat', output_shape: (1, 72, 64, 32).
[SSAConverter] [127/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-3/Conv2D', output_shape: (1, 96, 64, 32).
[SSAConverter] [128/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-3/mul', output_shape: (1, 96, 64, 32).
[SSAConverter] [129/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-3/Maximum', output_shape: (1, 96, 64, 32).
[SSAConverter] [130/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-4/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [131/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-4/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [132/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-4/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [133/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-5/Conv2D', output_shape: (1, 32, 64, 32).
[SSAConverter] [134/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-5/mul', output_shape: (1, 32, 64, 32).
[SSAConverter] [135/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-5/Maximum', output_shape: (1, 32, 64, 32).
[SSAConverter] [136/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-6/Conv2D', output_shape: (1, 8, 64, 32).
[SSAConverter] [137/143] Converting op type: 'Transpose', name: 'model/L3/estimator/disp-6/Conv2D_to_nhwc', output_shape: (1, 64, 32, 8).
[SSAConverter] [138/143] Converting op type: 'Slice', name: 'model/L3/estimator/Slice', output_shape: (1, 64, 32, 2).
[SSAConverter] [139/143] Converting op type: 'Sigmoid', name: 'model/L3/estimator/Sigmoid', output_shape: (1, 64, 32, 2).
[SSAConverter] [140/143] Converting op type: 'Mul', name: 'model/L3/estimator/mul', output_shape: (1, 64, 32, 2).
[SSAConverter] [141/143] Converting op type: 'Transpose', name: 'model/L3/estimator/mul_to_nchw', output_shape: (1, 2, 64, 32).
[SSAConverter] [142/143] Converting op type: 'ResizeBilinear', name: 'model/L0/ResizeBilinear_orig0', output_shape: (1, 2, 512, 256).
[SSAConverter] [143/143] Converting op type: 'Transpose', name: 'model/L0/ResizeBilinear', output_shape: (1, 512, 256, 2).

Training Code

Hi, Can you share the training code?

what is the range you used for midas output

As midas output range is from 0 to a few thousand, I tried to use this output as supervision but found the loss cannot decrease at all. If I scale the output to 0 to 1, the training will converge to all zero output very quickly. Can I know what range you used in your training? thx

" Data loss: not an sstable" error when running provided pretrained inference

Hello there,

When using your provided model from Google Drive, inference.py throws this error when attempting to read/restore the model:

Data loss: Unable to open table file ckpt/pydnet/pydnet.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

This happens both when importing the model directly and just the folder. This may have to do with the ".meta" file missing, but I am unsure. Thanks

About training loss

Hi,

I'm trying to train the model myself. In the paper, the loss is defined as l1 loss plus a gradient loss in “Learning the depths of moving people by watching frozen people”. But I found that they calculate the gradient loss in log space. Do you use log depth and gt as well? Besides, for the l1 loss, |(Dxs −Dgt)|, are the output and gt both in the range [0, 255]?

Thanks.

Can you provide a detailed documentation towards training the model ?

@FilippoAleotti

Android Version

Hello,

this is more of a request rather than an issue. Will you be open-sourcing the Android code anytime soon.

Thank you so much for this amazing work!

Problems with TFLite model quality

Hi @FilippoAleotti,

I am a high school student trying to use you and your team's PyDNet model for a project I am working on. Specifically, I'm trying to run your .tflite model on a Raspberry Pi but I am having trouble getting a similar quality depth map from it as shown in the demos.

I notice #1 has a mention about this issue but I also noticed that #3, which introduced "optimized_pydnet++.tflite", comes a few months after the last comment on #1 which suggests that you did not face this issue with model quality.

AFAICT I am performing the same procedures that are done in the Android sample on the master branch (scaling the input and dividing the pixel values by 255) but the output still looks wrong. This Colab I have here demonstrates the issues I am having.

For some reason, the model is consistently outputting a ridiculously high depth value for that particular spot in all images.

This is possibly a TensorFlow bug so I worked around it by simply cropping the output depth to a region where that point is not in the image but then I stumble across another problem.

The cropped depth map looks to be of a rather low quality unlike the demos shown. Here is an example:

The input I'm providing is 640x448 pixels and from what I can tell there is no option to configure the resolution on the tflite model.

I have been trying to get this working for a few days now and am at a lost on what to do. I realise that you're busy with your research but could you offer me some guidance on how to move forward? I notice that you have v2 of PyDNet mobile in the works for Android, could you provide an update on the status for a v2 tflite model?

Thank you for the work you've put into this project, it is quite incredible!

Do you use BN during training？

supported resolutions

Hi @FilippoAleotti
I see in Resolution Enum (Android) these values:
RES1(512, 256), RES2(640, 192), RES3(320, 96), RES4(640, 448);
Are they only supported by the model?
I've tried to feed the model with a static image and it gets cropped.
thanks

Disparity to distance

First I want to congratulate you on the project.

I would like to know how to convert, the result of the model, to distance in meters.

Searching on the internet I found the following formula
depth = baseline * focal / disparity

disparity: is the result of the model, a number between 0 and 1. (modelResult - min_result) / (max_result - min_result)
baseline: seems to depend on the training data, for kitti dataset I have found values of is 0.54 or 0.22
focal: this i think is the focal length of the camera but i'm not sure i think a value close to 2262

I'm not sure if this is the correct way to do it, nor what parameters to use for the model you trained.

In case of exporting the model with different height and width, what parameters do I have to change?

python export.py --ckpt ckpt/pydnet \
        --arch pydnet \
        --dest "./" \
        --height 192 --width 192

The new mlmodel

Hello! Great work! I have a few questions:
Is the new mlmodel the one trained using MiDaS outputs on 450K internet images?
What is its output? Looks like an 8-bit image that would be scaled and shifted disparity map. Is that right? What if I want to get the original disparity map (or at least floats and not uint), is there an easy way to modify the mlmodel?

training loss weight for different scale depth prediction

the results of my network is quite blur. I feel it is about the loss.

Can I know if you used multi-scale graidient loss? I saw the mega depth paper used 4 scales
Can I know your weight setting for the loss computation on depth prediction at different resolutions, i.e. prediction1, prediction2, and prediction3 in your model. did you used the same weight of those three?

Many thanks!

Dataset release

This is really amazing work with great real-life application potential. Will you provide more details about the training dataset?

Some questions about training data

Hi,

How is the WILD dataset maked？The ground truth of the WILD dataset is generated by Midas？Thank you.

How does mobilePydnet architecture differ from Pydnet architecture

You have mentioned in the paper that transposed convolutions have been replaced by upsampling and convolution blocks. What else has been changed?

Quantized Model

Dear Filippo,

Thank you for the great work you do! In your paper, you achieved great performance with quantization, yet I only see a quantized model for iOS. I tried converting the frozen graph that's made with the export.py script with post-training quantization but I haven't had much luck. Would it be possible for you to upload the the quantized model as a frozen graph file? If not, would you be able to specify on how you converted your model? I followed the tutorial from https://blog.tensorflow.org/2019/06/tensorflow-integer-quantization.html using the NYU dataset, but the resulting tflite is much slower than the regular one.

Any help appreciated!

About training

Figure1

Hi @FilippoAleotti, really thanks for sharing. I have some questions about training
1 As your readme shows that you train on martport with supervision, as far as i know, the martport3D dataset has up, left, right view image as figure1 shows, all of these in your train set?

2 Do you mind share your training code with supervision?

Question: choice of input resolution

I am wondering how you arrived at the input resolution of 640x384, which is a 5:3 aspect ratio. I saw in other papers, they work with flexible aspect ratios; for example, in MiDaS the longer side's length is 384 and the other side's length is divisible by 32. Sometimes, it's a square, or at least the model is trained on squares. What were your considerations for your choice? What if my image is vertical, e.g. 9:16, - do you think the result will be adversely affected by its resizing?

Model Output Issues

With the new tflite optimized model, I am having some issues with the output. It is coming out as a 32FC2 vector, which when split produces "masks" that appear like this (image attached). I am not sure what part of the process is wrong here, for when I ran a tflite version of the model on the same platform without GPU optimization, I did not have this issue.

How to support KPU?

I want test it on k210, but I can not transform model.

1. Import graph...
Fatal: Shapes must be same, but got [1] and []

And k210 Input feature maps smaller than or equal to 320x240(WxH)
https://github.com/kendryte/nncase/blob/master/docs/FAQ_EN.md

About evaluation and loss function

Hi sir,

Thank you for the great project.
I would like to improve the model based on the method of your paper.
Is it possible to release evaluation script that I can reproduce the evaluation benchmark results on the paper by the pre-trained model?
So that I can examine my results to check if I got something wrong while in training process.

I also have another question, did you calculate the multi-scale gradient loss by using output of the upsampling layers?
If not, may I know how did you realize the loss calculation?

Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble