filippoaleotti / mobilepydnet Goto Github PK
View Code? Open in Web Editor NEWPydnet on mobile devices
License: Apache License 2.0
Pydnet on mobile devices
License: Apache License 2.0
Hi,
Thanks for the great work! For the newly released v2 model for iOS, the model is in mlmodel format. I'm wondering if you could share the original Tensorflow saved model (keras hdf5 maybe) or checkpoint so I could run it on the computer?
Thank you!
First of all, thanks for sharing the great project! I have tried to implement your mobilePydnet network but cannot reach totally to the same results compared with pre-trained model. For that reason I have several questions about the model, loss, data and training itself.
Did you initialize weights and biases by using some particular initialization strategy or did you just use the default initialization of convolution layers?
Did you use any data augmentation like flipping, rotating, random cropping or blurring?
You told here in the issues section that your range of input and output images are [0,255]. Does it mean that in the training when you load input image and ground truth as float32, you don't normalize them for example by dividing 255 to range [0,1]?
The loss is described in the paper as: , where is fixed to 1 and goes from 0.5, 0.25, 0.125 (if I understood correctly you just used 3 different scales). Here is the python code for calculating the loss but I'm not sure if I am missing something:
Hello Filippo Aleotti, As per #20 you have mentioned the mobilepydnet was trained using depth map. But the pydnet repo suggests using monodepth training script which require stereo image pair. I would like to train pydnet for custom RGB-D images instead of stereo image pair. Can you guide me how to use the depth map as reference signal for training as I am new to using pydnet.
Thank in advance
Hi, thanks for sharing this great project.
How is it possible to train the model on images with height and width multiples of 8 rather than 64px?
I have a follow-up question about issue #11:
In your paper "Real-time single image depth perception in the wild with handheld devices", you mentioned about picking 447k images from the two datasets. Did you systematically select a fixed amount of images across different categories? What's the strategy?
Are you planning to release the WILD dataset as well as the training code?
Great work, by the way.
Hi, do you use data augmentation like random resizing?
According to your last changes in the commit, does your app support tflite?
In README it is still written that the app does not support tflite.
I'm running the app on Samsung S8.
I get about 0.5 fps for the highest quality and around 2 fps for the lowest.
I'm trying to speed it up but have little experience with this.
From what I found in the documentation GPU will give about 2x but will only work with tflite.
Does anyone intend to contribute in this direction or can help with some guidance?
Crashing at startup:
2020-08-23 14:01:54.068 3006-3124/unibo.cvlab.pydnet_tflite E/AndroidRuntime: FATAL EXCEPTION: inference
Process: unibo.cvlab.pydnet_tflite, PID: 3006
java.lang.IllegalStateException: Internal error: Unexpected failure when preparing tensor allocations: tensorflow/lite/kernels/conv.cc:313 input->dims->size != 4 (1 != 4)
Node number 0 (CONV_2D) failed to prepare.
at org.tensorflow.lite.NativeInterpreterWrapper.allocateTensors(Native Method)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:149)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:343)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:304)
at unibo.cvlab.pydnet.Model.doInference(Model.java:89)
at unibo.cvlab.pydnet.StreamActivity.doInference(StreamActivity.java:131)
at unibo.cvlab.pydnet.StreamActivity.access$200(StreamActivity.java:26)
at unibo.cvlab.pydnet.StreamActivity$2.run(StreamActivity.java:122)
at android.os.Handler.handleCallback(Handler.java:883)
at android.os.Handler.dispatchMessage(Handler.java:100)
at android.os.Looper.loop(Looper.java:224)
at android.os.HandlerThread.run(HandlerThread.java:67)
Hi @FilippoAleotti
I want to train this model myself.
I looked into Pydnet and monodepth and both used stereo images as training data
Did you create the Depthmap in Midas and use it to create a stereo image?
Didn't modify the code at all. I trying to run this on a Vivo Nex S phone
I tried reconverting the latest pretrained model from the Pydnet repository to ios via tfcoreml. The conversion is successful but the output shape has wrong dimensions:
(1, 512, 256, 2)
I expected it to be 1 instead of 2. I know there is already a provided ios coreml file here but i plan to retrain the Pydnet model on my own dataset later. That's why i am attempting to do the conversation myself.
mlmodel = tfcoreml.convert(
tf_model_path = './checkpoint/IROS18/frozen_model.pb',
mlmodel_path = './checkpoint/IROS18/pydnet.mlmodel',
output_feature_names = ['model/L0/ResizeBilinear'],
image_input_names = ['input'],
input_name_shape_dict = {'input':[1,512,256,3]},
minimum_ios_deployment_target = '13'
)
@GZaccaroni did you encounter such issue when you did the conversion for the ios part? Due to the incorrect dimension, i am not able to transform it to a valid image.
Here is the full conversion log from tfcoreml:
[SSAConverter] Converting function main ...
[SSAConverter] [1/143] Converting op type: 'Placeholder', name: 'input', output_shape: (1, 512, 256, 3).
[SSAConverter] [2/143] Converting op type: 'Const', name: 'model/pyramid/conv1a/mul/x'.
[SSAConverter] [3/143] Converting op type: 'Const', name: 'model/pyramid/conv1b/mul/x'.
[SSAConverter] [4/143] Converting op type: 'Const', name: 'model/pyramid/conv2a/mul/x'.
[SSAConverter] [5/143] Converting op type: 'Const', name: 'model/pyramid/conv2b/mul/x'.
[SSAConverter] [6/143] Converting op type: 'Const', name: 'model/pyramid/conv3a/mul/x'.
[SSAConverter] [7/143] Converting op type: 'Const', name: 'model/pyramid/conv3b/mul/x'.
[SSAConverter] [8/143] Converting op type: 'Const', name: 'model/pyramid/conv4a/mul/x'.
[SSAConverter] [9/143] Converting op type: 'Const', name: 'model/pyramid/conv4b/mul/x'.
[SSAConverter] [10/143] Converting op type: 'Const', name: 'model/pyramid/conv5a/mul/x'.
[SSAConverter] [11/143] Converting op type: 'Const', name: 'model/pyramid/conv5b/mul/x'.
[SSAConverter] [12/143] Converting op type: 'Const', name: 'model/pyramid/conv6a/mul/x'.
[SSAConverter] [13/143] Converting op type: 'Const', name: 'model/pyramid/conv6b/mul/x'.
[SSAConverter] [14/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-3/mul/x'.
[SSAConverter] [15/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-4/mul/x'.
[SSAConverter] [16/143] Converting op type: 'Const', name: 'model/L6/estimator/disp-5/mul/x'.
[SSAConverter] [17/143] Converting op type: 'Const', name: 'model/L6/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [18/143] Converting op type: 'Const', name: 'model/L6/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [19/143] Converting op type: 'Const', name: 'model/L6/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [20/143] Converting op type: 'Const', name: 'model/L6/upsampler/mul/x'.
[SSAConverter] [21/143] Converting op type: 'Const', name: 'model/L5/estimator/concat/axis'.
[SSAConverter] [22/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-3/mul/x'.
[SSAConverter] [23/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-4/mul/x'.
[SSAConverter] [24/143] Converting op type: 'Const', name: 'model/L5/estimator/disp-5/mul/x'.
[SSAConverter] [25/143] Converting op type: 'Const', name: 'model/L5/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [26/143] Converting op type: 'Const', name: 'model/L5/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [27/143] Converting op type: 'Const', name: 'model/L5/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [28/143] Converting op type: 'Const', name: 'model/L5/upsampler/mul/x'.
[SSAConverter] [29/143] Converting op type: 'Const', name: 'model/L4/estimator/concat/axis'.
[SSAConverter] [30/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-3/mul/x'.
[SSAConverter] [31/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-4/mul/x'.
[SSAConverter] [32/143] Converting op type: 'Const', name: 'model/L4/estimator/disp-5/mul/x'.
[SSAConverter] [33/143] Converting op type: 'Const', name: 'model/L4/upsampler/weights/read', output_shape: (2, 2, 8, 8).
[SSAConverter] [34/143] Converting op type: 'Const', name: 'model/L4/upsampler/biases/read', output_shape: (1, 8, 1, 1).
[SSAConverter] [35/143] Converting op type: 'Const', name: 'model/L4/upsampler/conv2d_transpose/output_shape', output_shape: (4,).
[SSAConverter] [36/143] Converting op type: 'Const', name: 'model/L4/upsampler/mul/x'.
[SSAConverter] [37/143] Converting op type: 'Const', name: 'model/L3/estimator/concat/axis'.
[SSAConverter] [38/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-3/mul/x'.
[SSAConverter] [39/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-4/mul/x'.
[SSAConverter] [40/143] Converting op type: 'Const', name: 'model/L3/estimator/disp-5/mul/x'.
[SSAConverter] [41/143] Converting op type: 'Const', name: 'model/L3/estimator/Slice/begin', output_shape: (4,).
[SSAConverter] [42/143] Converting op type: 'Const', name: 'model/L3/estimator/Slice/size', output_shape: (4,).
[SSAConverter] [43/143] Converting op type: 'Const', name: 'model/L3/estimator/mul/x'.
[SSAConverter] [44/143] Converting op type: 'Const', name: 'model/L0/size', output_shape: (2,).
[SSAConverter] [45/143] Converting op type: 'Transpose', name: 'input_to_nchw', output_shape: (1, 3, 512, 256).
[SSAConverter] [46/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv1a/Conv2D', output_shape: (1, 16, 256, 128).
[SSAConverter] [47/143] Converting op type: 'Mul', name: 'model/pyramid/conv1a/mul', output_shape: (1, 16, 256, 128).
[SSAConverter] [48/143] Converting op type: 'Maximum', name: 'model/pyramid/conv1a/Maximum', output_shape: (1, 16, 256, 128).
[SSAConverter] [49/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv1b/Conv2D', output_shape: (1, 16, 256, 128).
[SSAConverter] [50/143] Converting op type: 'Mul', name: 'model/pyramid/conv1b/mul', output_shape: (1, 16, 256, 128).
[SSAConverter] [51/143] Converting op type: 'Maximum', name: 'model/pyramid/conv1b/Maximum', output_shape: (1, 16, 256, 128).
[SSAConverter] [52/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv2a/Conv2D', output_shape: (1, 32, 128, 64).
[SSAConverter] [53/143] Converting op type: 'Mul', name: 'model/pyramid/conv2a/mul', output_shape: (1, 32, 128, 64).
[SSAConverter] [54/143] Converting op type: 'Maximum', name: 'model/pyramid/conv2a/Maximum', output_shape: (1, 32, 128, 64).
[SSAConverter] [55/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv2b/Conv2D', output_shape: (1, 32, 128, 64).
[SSAConverter] [56/143] Converting op type: 'Mul', name: 'model/pyramid/conv2b/mul', output_shape: (1, 32, 128, 64).
[SSAConverter] [57/143] Converting op type: 'Maximum', name: 'model/pyramid/conv2b/Maximum', output_shape: (1, 32, 128, 64).
[SSAConverter] [58/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv3a/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [59/143] Converting op type: 'Mul', name: 'model/pyramid/conv3a/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [60/143] Converting op type: 'Maximum', name: 'model/pyramid/conv3a/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [61/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv3b/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [62/143] Converting op type: 'Mul', name: 'model/pyramid/conv3b/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [63/143] Converting op type: 'Maximum', name: 'model/pyramid/conv3b/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [64/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv4a/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [65/143] Converting op type: 'Mul', name: 'model/pyramid/conv4a/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [66/143] Converting op type: 'Maximum', name: 'model/pyramid/conv4a/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [67/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv4b/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [68/143] Converting op type: 'Mul', name: 'model/pyramid/conv4b/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [69/143] Converting op type: 'Maximum', name: 'model/pyramid/conv4b/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [70/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv5a/Conv2D', output_shape: (1, 128, 16, 8).
[SSAConverter] [71/143] Converting op type: 'Mul', name: 'model/pyramid/conv5a/mul', output_shape: (1, 128, 16, 8).
[SSAConverter] [72/143] Converting op type: 'Maximum', name: 'model/pyramid/conv5a/Maximum', output_shape: (1, 128, 16, 8).
[SSAConverter] [73/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv5b/Conv2D', output_shape: (1, 128, 16, 8).
[SSAConverter] [74/143] Converting op type: 'Mul', name: 'model/pyramid/conv5b/mul', output_shape: (1, 128, 16, 8).
[SSAConverter] [75/143] Converting op type: 'Maximum', name: 'model/pyramid/conv5b/Maximum', output_shape: (1, 128, 16, 8).
[SSAConverter] [76/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv6a/Conv2D', output_shape: (1, 192, 8, 4).
[SSAConverter] [77/143] Converting op type: 'Mul', name: 'model/pyramid/conv6a/mul', output_shape: (1, 192, 8, 4).
[SSAConverter] [78/143] Converting op type: 'Maximum', name: 'model/pyramid/conv6a/Maximum', output_shape: (1, 192, 8, 4).
[SSAConverter] [79/143] Converting op type: 'Conv2D', name: 'model/pyramid/conv6b/Conv2D', output_shape: (1, 192, 8, 4).
[SSAConverter] [80/143] Converting op type: 'Mul', name: 'model/pyramid/conv6b/mul', output_shape: (1, 192, 8, 4).
[SSAConverter] [81/143] Converting op type: 'Maximum', name: 'model/pyramid/conv6b/Maximum', output_shape: (1, 192, 8, 4).
[SSAConverter] [82/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-3/Conv2D', output_shape: (1, 96, 8, 4).
[SSAConverter] [83/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-3/mul', output_shape: (1, 96, 8, 4).
[SSAConverter] [84/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-3/Maximum', output_shape: (1, 96, 8, 4).
[SSAConverter] [85/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-4/Conv2D', output_shape: (1, 64, 8, 4).
[SSAConverter] [86/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-4/mul', output_shape: (1, 64, 8, 4).
[SSAConverter] [87/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-4/Maximum', output_shape: (1, 64, 8, 4).
[SSAConverter] [88/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-5/Conv2D', output_shape: (1, 32, 8, 4).
[SSAConverter] [89/143] Converting op type: 'Mul', name: 'model/L6/estimator/disp-5/mul', output_shape: (1, 32, 8, 4).
[SSAConverter] [90/143] Converting op type: 'Maximum', name: 'model/L6/estimator/disp-5/Maximum', output_shape: (1, 32, 8, 4).
[SSAConverter] [91/143] Converting op type: 'Conv2D', name: 'model/L6/estimator/disp-6/Conv2D', output_shape: (1, 8, 8, 4).
[SSAConverter] [92/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L6/upsampler/conv2d_transpose', output_shape: (1, 8, 16, 8).
[SSAConverter] [93/143] Converting op type: 'BiasAdd', name: 'model/L6/upsampler/BiasAdd', output_shape: (1, 8, 16, 8).
[SSAConverter] [94/143] Converting op type: 'Mul', name: 'model/L6/upsampler/mul', output_shape: (1, 8, 16, 8).
[SSAConverter] [95/143] Converting op type: 'Maximum', name: 'model/L6/upsampler/Maximum', output_shape: (1, 8, 16, 8).
[SSAConverter] [96/143] Converting op type: 'ConcatV2', name: 'model/L5/estimator/concat', output_shape: (1, 136, 16, 8).
[SSAConverter] [97/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-3/Conv2D', output_shape: (1, 96, 16, 8).
[SSAConverter] [98/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-3/mul', output_shape: (1, 96, 16, 8).
[SSAConverter] [99/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-3/Maximum', output_shape: (1, 96, 16, 8).
[SSAConverter] [100/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-4/Conv2D', output_shape: (1, 64, 16, 8).
[SSAConverter] [101/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-4/mul', output_shape: (1, 64, 16, 8).
[SSAConverter] [102/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-4/Maximum', output_shape: (1, 64, 16, 8).
[SSAConverter] [103/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-5/Conv2D', output_shape: (1, 32, 16, 8).
[SSAConverter] [104/143] Converting op type: 'Mul', name: 'model/L5/estimator/disp-5/mul', output_shape: (1, 32, 16, 8).
[SSAConverter] [105/143] Converting op type: 'Maximum', name: 'model/L5/estimator/disp-5/Maximum', output_shape: (1, 32, 16, 8).
[SSAConverter] [106/143] Converting op type: 'Conv2D', name: 'model/L5/estimator/disp-6/Conv2D', output_shape: (1, 8, 16, 8).
[SSAConverter] [107/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L5/upsampler/conv2d_transpose', output_shape: (1, 8, 32, 16).
[SSAConverter] [108/143] Converting op type: 'BiasAdd', name: 'model/L5/upsampler/BiasAdd', output_shape: (1, 8, 32, 16).
[SSAConverter] [109/143] Converting op type: 'Mul', name: 'model/L5/upsampler/mul', output_shape: (1, 8, 32, 16).
[SSAConverter] [110/143] Converting op type: 'Maximum', name: 'model/L5/upsampler/Maximum', output_shape: (1, 8, 32, 16).
[SSAConverter] [111/143] Converting op type: 'ConcatV2', name: 'model/L4/estimator/concat', output_shape: (1, 104, 32, 16).
[SSAConverter] [112/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-3/Conv2D', output_shape: (1, 96, 32, 16).
[SSAConverter] [113/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-3/mul', output_shape: (1, 96, 32, 16).
[SSAConverter] [114/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-3/Maximum', output_shape: (1, 96, 32, 16).
[SSAConverter] [115/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-4/Conv2D', output_shape: (1, 64, 32, 16).
[SSAConverter] [116/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-4/mul', output_shape: (1, 64, 32, 16).
[SSAConverter] [117/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-4/Maximum', output_shape: (1, 64, 32, 16).
[SSAConverter] [118/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-5/Conv2D', output_shape: (1, 32, 32, 16).
[SSAConverter] [119/143] Converting op type: 'Mul', name: 'model/L4/estimator/disp-5/mul', output_shape: (1, 32, 32, 16).
[SSAConverter] [120/143] Converting op type: 'Maximum', name: 'model/L4/estimator/disp-5/Maximum', output_shape: (1, 32, 32, 16).
[SSAConverter] [121/143] Converting op type: 'Conv2D', name: 'model/L4/estimator/disp-6/Conv2D', output_shape: (1, 8, 32, 16).
[SSAConverter] [122/143] Converting op type: 'Conv2DBackpropInput', name: 'model/L4/upsampler/conv2d_transpose', output_shape: (1, 8, 64, 32).
[SSAConverter] [123/143] Converting op type: 'BiasAdd', name: 'model/L4/upsampler/BiasAdd', output_shape: (1, 8, 64, 32).
[SSAConverter] [124/143] Converting op type: 'Mul', name: 'model/L4/upsampler/mul', output_shape: (1, 8, 64, 32).
[SSAConverter] [125/143] Converting op type: 'Maximum', name: 'model/L4/upsampler/Maximum', output_shape: (1, 8, 64, 32).
[SSAConverter] [126/143] Converting op type: 'ConcatV2', name: 'model/L3/estimator/concat', output_shape: (1, 72, 64, 32).
[SSAConverter] [127/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-3/Conv2D', output_shape: (1, 96, 64, 32).
[SSAConverter] [128/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-3/mul', output_shape: (1, 96, 64, 32).
[SSAConverter] [129/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-3/Maximum', output_shape: (1, 96, 64, 32).
[SSAConverter] [130/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-4/Conv2D', output_shape: (1, 64, 64, 32).
[SSAConverter] [131/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-4/mul', output_shape: (1, 64, 64, 32).
[SSAConverter] [132/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-4/Maximum', output_shape: (1, 64, 64, 32).
[SSAConverter] [133/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-5/Conv2D', output_shape: (1, 32, 64, 32).
[SSAConverter] [134/143] Converting op type: 'Mul', name: 'model/L3/estimator/disp-5/mul', output_shape: (1, 32, 64, 32).
[SSAConverter] [135/143] Converting op type: 'Maximum', name: 'model/L3/estimator/disp-5/Maximum', output_shape: (1, 32, 64, 32).
[SSAConverter] [136/143] Converting op type: 'Conv2D', name: 'model/L3/estimator/disp-6/Conv2D', output_shape: (1, 8, 64, 32).
[SSAConverter] [137/143] Converting op type: 'Transpose', name: 'model/L3/estimator/disp-6/Conv2D_to_nhwc', output_shape: (1, 64, 32, 8).
[SSAConverter] [138/143] Converting op type: 'Slice', name: 'model/L3/estimator/Slice', output_shape: (1, 64, 32, 2).
[SSAConverter] [139/143] Converting op type: 'Sigmoid', name: 'model/L3/estimator/Sigmoid', output_shape: (1, 64, 32, 2).
[SSAConverter] [140/143] Converting op type: 'Mul', name: 'model/L3/estimator/mul', output_shape: (1, 64, 32, 2).
[SSAConverter] [141/143] Converting op type: 'Transpose', name: 'model/L3/estimator/mul_to_nchw', output_shape: (1, 2, 64, 32).
[SSAConverter] [142/143] Converting op type: 'ResizeBilinear', name: 'model/L0/ResizeBilinear_orig0', output_shape: (1, 2, 512, 256).
[SSAConverter] [143/143] Converting op type: 'Transpose', name: 'model/L0/ResizeBilinear', output_shape: (1, 512, 256, 2).
Hi, Can you share the training code?
As midas output range is from 0 to a few thousand, I tried to use this output as supervision but found the loss cannot decrease at all. If I scale the output to 0 to 1, the training will converge to all zero output very quickly. Can I know what range you used in your training? thx
Hello there,
When using your provided model from Google Drive, inference.py throws this error when attempting to read/restore the model:
Data loss: Unable to open table file ckpt/pydnet/pydnet.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
This happens both when importing the model directly and just the folder. This may have to do with the ".meta" file missing, but I am unsure. Thanks
Hi,
I'm trying to train the model myself. In the paper, the loss is defined as l1 loss plus a gradient loss in “Learning the depths of moving people by watching frozen people”. But I found that they calculate the gradient loss in log space. Do you use log depth and gt as well? Besides, for the l1 loss, |(Dxs −Dgt)|, are the output and gt both in the range [0, 255]?
Thanks.
Hello,
this is more of a request rather than an issue. Will you be open-sourcing the Android code anytime soon.
Thank you so much for this amazing work!
Hi @FilippoAleotti,
I am a high school student trying to use you and your team's PyDNet model for a project I am working on. Specifically, I'm trying to run your .tflite model on a Raspberry Pi but I am having trouble getting a similar quality depth map from it as shown in the demos.
I notice #1 has a mention about this issue but I also noticed that #3, which introduced "optimized_pydnet++.tflite", comes a few months after the last comment on #1 which suggests that you did not face this issue with model quality.
AFAICT I am performing the same procedures that are done in the Android sample on the master branch (scaling the input and dividing the pixel values by 255) but the output still looks wrong. This Colab I have here demonstrates the issues I am having.
For some reason, the model is consistently outputting a ridiculously high depth value for that particular spot in all images.
This is possibly a TensorFlow bug so I worked around it by simply cropping the output depth to a region where that point is not in the image but then I stumble across another problem.
The cropped depth map looks to be of a rather low quality unlike the demos shown. Here is an example:
The input I'm providing is 640x448 pixels and from what I can tell there is no option to configure the resolution on the tflite model.
I have been trying to get this working for a few days now and am at a lost on what to do. I realise that you're busy with your research but could you offer me some guidance on how to move forward? I notice that you have v2 of PyDNet mobile in the works for Android, could you provide an update on the status for a v2 tflite model?
Thank you for the work you've put into this project, it is quite incredible!
Hi @FilippoAleotti
I see in Resolution Enum (Android) these values:
RES1(512, 256), RES2(640, 192), RES3(320, 96), RES4(640, 448);
Are they only supported by the model?
I've tried to feed the model with a static image and it gets cropped.
thanks
First I want to congratulate you on the project.
I would like to know how to convert, the result of the model, to distance in meters.
Searching on the internet I found the following formula
depth = baseline * focal / disparity
disparity: is the result of the model, a number between 0 and 1. (modelResult - min_result) / (max_result - min_result)
baseline: seems to depend on the training data, for kitti dataset I have found values of is 0.54 or 0.22
focal: this i think is the focal length of the camera but i'm not sure i think a value close to 2262
I'm not sure if this is the correct way to do it, nor what parameters to use for the model you trained.
In case of exporting the model with different height and width, what parameters do I have to change?
python export.py --ckpt ckpt/pydnet \
--arch pydnet \
--dest "./" \
--height 192 --width 192
Hello! Great work! I have a few questions:
Is the new mlmodel the one trained using MiDaS outputs on 450K internet images?
What is its output? Looks like an 8-bit image that would be scaled and shifted disparity map. Is that right? What if I want to get the original disparity map (or at least floats and not uint), is there an easy way to modify the mlmodel?
the results of my network is quite blur. I feel it is about the loss.
Many thanks!
This is really amazing work with great real-life application potential. Will you provide more details about the training dataset?
Hi,
How is the WILD dataset maked?The ground truth of the WILD dataset is generated by Midas?Thank you.
You have mentioned in the paper that transposed convolutions have been replaced by upsampling and convolution blocks. What else has been changed?
Dear Filippo,
Thank you for the great work you do! In your paper, you achieved great performance with quantization, yet I only see a quantized model for iOS. I tried converting the frozen graph that's made with the export.py script with post-training quantization but I haven't had much luck. Would it be possible for you to upload the the quantized model as a frozen graph file? If not, would you be able to specify on how you converted your model? I followed the tutorial from https://blog.tensorflow.org/2019/06/tensorflow-integer-quantization.html using the NYU dataset, but the resulting tflite is much slower than the regular one.
Any help appreciated!
Hi @FilippoAleotti, really thanks for sharing. I have some questions about training
1 As your readme shows that you train on martport with supervision, as far as i know, the martport3D dataset has up, left, right view image as figure1 shows, all of these in your train set?
2 Do you mind share your training code with supervision?
I am wondering how you arrived at the input resolution of 640x384, which is a 5:3 aspect ratio. I saw in other papers, they work with flexible aspect ratios; for example, in MiDaS the longer side's length is 384 and the other side's length is divisible by 32. Sometimes, it's a square, or at least the model is trained on squares. What were your considerations for your choice? What if my image is vertical, e.g. 9:16, - do you think the result will be adversely affected by its resizing?
With the new tflite optimized model, I am having some issues with the output. It is coming out as a 32FC2 vector, which when split produces "masks" that appear like this (image attached). I am not sure what part of the process is wrong here, for when I ran a tflite version of the model on the same platform without GPU optimization, I did not have this issue.
I want test it on k210, but I can not transform model.
1. Import graph...
Fatal: Shapes must be same, but got [1] and []
And k210 Input feature maps smaller than or equal to 320x240(WxH)
https://github.com/kendryte/nncase/blob/master/docs/FAQ_EN.md
Hi sir,
Thank you for the great project.
I would like to improve the model based on the method of your paper.
Is it possible to release evaluation script that I can reproduce the evaluation benchmark results on the paper by the pre-trained model?
So that I can examine my results to check if I got something wrong while in training process.
I also have another question, did you calculate the multi-scale gradient loss by using output of the upsampling layers?
If not, may I know how did you realize the loss calculation?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.