spacewalk01 / depth-anything-tensorrt Goto Github PK

Unofficial cpp and python implementation of depth-anything model using tensorrt api.

Home Page: https://depth-anything.github.io/

License: MIT License

C++ 26.91% Python 71.00% CMake 2.09%

cpp depth-anything depth-camera depth-estimation depth-image image-depth-estimation monocular-depth-estimation python tensorrt video-depth

depth-anything-tensorrt's Introduction

depth-anything-tensorrt's People

Contributors

Stargazers

Watchers

Forkers

johnbhlm luongh thangdt277 lbq779660843 martenwikman wilbur-lqw silencht ylabo0717 everestrs lwq2edu forexwiki zhefan-xu wang11wei zxc1139 thanhpham1987 pythonzz0622 lexseal

depth-anything-tensorrt's Issues

Uncaught exception detected: Unable to open library: nvinfer_plugin.dll

Hello,

I'm trying to run the command to convert the .onxx model to the .engine extension but I'm running in this issue where the nvinfer_plugin.dll cannot be open.

I triple checked things and the DLL is in the correct PATH ( I tried both adding it to the PATH and also copying the files to the cuda bin and lib folder) but I always get the same error message.

Any guidance would be greatly appreciated.

I tried with both TensorRT-8.6.1.6.Windows10.x86_64.cuda-12.0 and TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0; as well as Cuda 12.0 and Cuda 12.4

a couple of extra questions:

Would it be possible to provide the .engine model in the repo ?
Do you think it's possible to use and load the onxx model directly?

thank you

Engine generated is for Relative Depth or Metric Depth?

Hi @spacewalk01, the existing process in your repository generates the engine for relative depth or Metric Depth?

Thank you in advance
Aman

depth-anaything-tensorrt-simplified.exe - System Error

Hi, thank you for this amazing project!

I faced an issue while running the .exe :

The code execution cannot proceed because opencv_world490d.dll was not found. Reinstalling the program may fix the problem

The path I added in the Cmaketext.txt for OpenCV is :
PATH\opencv\build\x64\vc16\lib

EDIT:
Find a workaround by adding the missing file directly in the folder with the .exe and it worked perfectly!

on my NVIDIA-RTX3090 gpu, time of per frame is 35ms±, is this speed normal?

good job, thanks your time!

Missing requirements in python branch

missing file for
pip install -r requirements.txt

Made a comfyui custom node

I ported this project to comfyui: https://github.com/yuvraj108c/ComfyUI-Depth-Anything-Tensorrt

Just wanted to say thank you! :))

[E] Error[4]: If_1249_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape.

01/30/2024-18:29:22] [E] Error[4]: If_1249_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape.
[01/30/2024-18:29:22] [E] [TRT] ModelImporter.cpp:773: While parsing node number 1249 [If -> "output"]:
[01/30/2024-18:29:22] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[01/30/2024-18:29:22] [E] [TRT] ModelImporter.cpp:775: input: "1593"
output: "output"
name: "If_1249"
op_type: "If"
attribute {
name: "then_branch"
g {
node {
input: "1588"
output: "1595"
name: "Squeeze_1250"
op_type: "Squeeze"
attribute {
name: "axes"
ints: 1
type: INTS
}
}
name: "torch-jit-export1"
output {
name: "1595"
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: "Squeeze1595_dim_0"
}
dim {
dim_param: "Squeeze1595_dim_1"
}
dim {
dim_param: "Squeeze1595_dim_2"
}
}
}
}
}
}
type: GRAPH
}
attribute {
name: "else_branch"
g {
node {
input: "1588"
output: "1596"
name: "Identity_1251"
op_type: "Identity"
}
name: "torch-jit-export2"
output {
name: "1596"
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: "Squeeze1595_dim_0"
}
dim {
dim_param: "Identity1596_dim_1"
}
dim {
dim_param: "Squeeze1595_dim_1"
}
dim {
dim_param: "Squeeze1595_dim_2"
}
}
}
}
}
}
type: GRAPH
}

[01/30/2024-18:29:22] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[01/30/2024-18:29:22] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - If_1249
If_1249_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape.
[01/30/2024-18:29:22] [E] Failed to parse onnx file
[01/30/2024-18:29:22] [I] Finish parsing network model
[01/30/2024-18:29:22] [E] Parsing model failed
[01/30/2024-18:29:22] [E] Failed to create engine from model or file.
[01/30/2024-18:29:22] [E] Engine set up failed

The above error was reported when using onnx model to export to tensorrt engine model. Have you ever encountered it？

Jetson support

Does it support Jetson devices? My device environment is as follows:
deepstream-app version 6.0.1
DeepStreamSDK 6.0.1
CUDA Driver Version: 10.2
CUDA Runtime Version: 10.2
TensorRT Version: 8.0
cuDNN Version: 8.2

If I use the python version, will it be slower than the c++ version?

How to export the large model?

Thank you for your work on this repo, This is amazing and I was able to do inference on my AGX Orin with TensorRT model, however the base model didn't fit my needs and I want to see if the larger model is more capable.

I have downloaded the large Pytorch model and ran this

python export.py --encoder vitb --load_from depth_anything_vitb14.pth --image_shape 3 518 518
It appears like there is a size mismatch?
Does the export also work for the large weights?
It results in
raceback (most recent call last): File "/home/ubuntu/Depth-Anything/export.py", line 63, in <module> main() File "/home/ubuntu/Depth-Anything/export.py", line 60, in main export_model(args.encoder, args.load_from, tuple(args.image_shape)) File "/home/ubuntu/Depth-Anything/export.py", line 35, in export_model depth_anything.load_state_dict(torch.load(load_from, map_location='cpu'), strict=True) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DPT_DINOv2: Unexpected key(s) in state_dict: "pretrained.blocks.12.norm1.weight", "pretrained.blocks.12.norm1.bias", "pretrained.blocks.12.attn.qkv.weight", "pretrained.blocks.12.attn.qkv.bias", "pretrained.blocks.12.attn.proj.weight", "pretrained.blocks.12.attn.proj.bias", "pretrained.blocks.12.ls1.gamma", "pretrained.blocks.12.norm2.weight", "pretrained.blocks.12.norm2.bias", "pretrained.blocks.12.mlp.fc1.weight", "pretrained.blocks.12.mlp.fc1.bias", "pretrained.blocks.12.mlp.fc2.weight", "pretrained.blocks.12.mlp.fc2.bias", "pretrained.blocks.12.ls2.gamma", "pretrained.blocks.13.norm1.weight", "pretrained.blocks.13.norm1.bias", "pretrained.blocks.13.attn.qkv.weight", "pretrained.blocks.13.attn.qkv.bias", "pretrained.blocks.13.attn.proj.weight", "pretrained.blocks.13.attn.proj.bias", "pretrained.blocks.13.ls1.gamma", "pretrained.blocks.13.norm2.weight", "pretrained.blocks.13.norm2.bias", "pretrained.blocks.13.mlp.fc1.weight", "pretrained.blocks.13.mlp.fc1.bias", "pretrained.blocks.13.mlp.fc2.weight", "pretrained.blocks.13.mlp.fc2.bias", "pretrained.blocks.13.ls2.gamma", "pretrained.blocks.14.norm1.weight", "pretrained.blocks.14.norm1.bias", "pretrained.blocks.14.attn.qkv.weight", "pretrained.blocks.14.attn.qkv.bias", "pretrained.blocks.14.attn.proj.weight", "pretrained.blocks.14.attn.proj.bias", "pretrained.blocks.14.ls1.gamma", "pretrained.blocks.14.norm2.weight", "pretrained.blocks.14.norm2.bias", "pretrained.blocks.14.mlp.fc1.weight", "pretrained.blocks.14.mlp.fc1.bias", "pretrained.blocks.14.mlp.fc2.weight", "pretrained.blocks.14.mlp.fc2.bias", "pretrained.blocks.14.ls2.gamma", "pretrained.blocks.15.norm1.weight", "pretrained.blocks.15.norm1.bias", "pretrained.blocks.15.attn.qkv.weight", "pretrained.blocks.15.attn.qkv.bias", "pretrained.blocks.15.attn.proj.weight", "pretrained.blocks.15.attn.proj.bias", "pretrained.blocks.15.ls1.gamma", "pretrained.blocks.15.norm2.weight", "pretrained.blocks.15.norm2.bias", "pretrained.blocks.15.mlp.fc1.weight", "pretrained.blocks.15.mlp.fc1.bias", "pretrained.blocks.15.mlp.fc2.weight", "pretrained.blocks.15.mlp.fc2.bias", "pretrained.blocks.15.ls2.gamma", "pretrained.blocks.16.norm1.weight", "pretrained.blocks.16.norm1.bias", "pretrained.blocks.16.attn.qkv.weight", "pretrained.blocks.16.attn.qkv.bias", "pretrained.blocks.16.attn.proj.weight", "pretrained.blocks.16.attn.proj.bias", "pretrained.blocks.16.ls1.gamma", "pretrained.blocks.16.norm2.weight", "pretrained.blocks.16.norm2.bias", "pretrained.blocks.16.mlp.fc1.weight", "pretrained.blocks.16.mlp.fc1.bias", "pretrained.blocks.16.mlp.fc2.weight", "pretrained.blocks.16.mlp.fc2.bias", "pretrained.blocks.16.ls2.gamma", "pretrained.blocks.17.norm1.weight", "pretrained.blocks.17.norm1.bias", "pretrained.blocks.17.attn.qkv.weight", "pretrained.blocks.17.attn.qkv.bias", "pretrained.blocks.17.attn.proj.weight", "pretrained.blocks.17.attn.proj.bias", "pretrained.blocks.17.ls1.gamma", "pretrained.blocks.17.norm2.weight", "pretrained.blocks.17.norm2.bias", "pretrained.blocks.17.mlp.fc1.weight", "pretrained.blocks.17.mlp.fc1.bias", "pretrained.blocks.17.mlp.fc2.weight", "pretrained.blocks.17.mlp.fc2.bias", "pretrained.blocks.17.ls2.gamma", "pretrained.blocks.18.norm1.weight", "pretrained.blocks.18.norm1.bias", "pretrained.blocks.18.attn.qkv.weight", "pretrained.blocks.18.attn.qkv.bias", "pretrained.blocks.18.attn.proj.weight", "pretrained.blocks.18.attn.proj.bias", "pretrained.blocks.18.ls1.gamma", "pretrained.blocks.18.norm2.weight", "pretrained.blocks.18.norm2.bias", "pretrained.blocks.18.mlp.fc1.weight", "pretrained.blocks.18.mlp.fc1.bias", "pretrained.blocks.18.mlp.fc2.weight", "pretrained.blocks.18.mlp.fc2.bias", "pretrained.blocks.18.ls2.gamma", "pretrained.blocks.19.norm1.weight", "pretrained.blocks.19.norm1.bias", "pretrained.blocks.19.attn.qkv.weight", "pretrained.blocks.19.attn.qkv.bias", "pretrained.blocks.19.attn.proj.weight", "pretrained.blocks.19.attn.proj.bias", "pretrained.blocks.19.ls1.gamma", "pretrained.blocks.19.norm2.weight", "pretrained.blocks.19.norm2.bias", "pretrained.blocks.19.mlp.fc1.weight", "pretrained.blocks.19.mlp.fc1.bias", "pretrained.blocks.19.mlp.fc2.weight", "pretrained.blocks.19.mlp.fc2.bias", "pretrained.blocks.19.ls2.gamma", "pretrained.blocks.20.norm1.weight", "pretrained.blocks.20.norm1.bias", "pretrained.blocks.20.attn.qkv.weight", "pretrained.blocks.20.attn.qkv.bias", "pretrained.blocks.20.attn.proj.weight", "pretrained.blocks.20.attn.proj.bias", "pretrained.blocks.20.ls1.gamma", "pretrained.blocks.20.norm2.weight", "pretrained.blocks.20.norm2.bias", "pretrained.blocks.20.mlp.fc1.weight", "pretrained.blocks.20.mlp.fc1.bias", "pretrained.blocks.20.mlp.fc2.weight", "pretrained.blocks.20.mlp.fc2.bias", "pretrained.blocks.20.ls2.gamma", "pretrained.blocks.21.norm1.weight", "pretrained.blocks.21.norm1.bias", "pretrained.blocks.21.attn.qkv.weight", "pretrained.blocks.21.attn.qkv.bias", "pretrained.blocks.21.attn.proj.weight", "pretrained.blocks.21.attn.proj.bias", "pretrained.blocks.21.ls1.gamma", "pretrained.blocks.21.norm2.weight", "pretrained.blocks.21.norm2.bias", "pretrained.blocks.21.mlp.fc1.weight", "pretrained.blocks.21.mlp.fc1.bias", "pretrained.blocks.21.mlp.fc2.weight", "pretrained.blocks.21.mlp.fc2.bias", "pretrained.blocks.21.ls2.gamma", "pretrained.blocks.22.norm1.weight", "pretrained.blocks.22.norm1.bias", "pretrained.blocks.22.attn.qkv.weight", "pretrained.blocks.22.attn.qkv.bias", "pretrained.blocks.22.attn.proj.weight", "pretrained.blocks.22.attn.proj.bias", "pretrained.blocks.22.ls1.gamma", "pretrained.blocks.22.norm2.weight", "pretrained.blocks.22.norm2.bias", "pretrained.blocks.22.mlp.fc1.weight", "pretrained.blocks.22.mlp.fc1.bias", "pretrained.blocks.22.mlp.fc2.weight", "pretrained.blocks.22.mlp.fc2.bias", "pretrained.blocks.22.ls2.gamma", "pretrained.blocks.23.norm1.weight", "pretrained.blocks.23.norm1.bias", "pretrained.blocks.23.attn.qkv.weight", "pretrained.blocks.23.attn.qkv.bias", "pretrained.blocks.23.attn.proj.weight", "pretrained.blocks.23.attn.proj.bias", "pretrained.blocks.23.ls1.gamma", "pretrained.blocks.23.norm2.weight", "pretrained.blocks.23.norm2.bias", "pretrained.blocks.23.mlp.fc1.weight", "pretrained.blocks.23.mlp.fc1.bias", "pretrained.blocks.23.mlp.fc2.weight", "pretrained.blocks.23.mlp.fc2.bias", "pretrained.blocks.23.ls2.gamma". size mismatch for pretrained.cls_token: copying a param with shape torch.Size([1, 1, 1024]) from checkpoint, the shape in current model is torch.Size([1, 1, 768]). size mismatch for pretrained.pos_embed: copying a param with shape torch.Size([1, 1370, 1024]) from checkpoint, the shape in current model is torch.Size([1, 1370, 768]). size mismatch for pretrained.mask_token: copying a param with shape torch.Size([1, 1024]) from checkpoint, the shape in current model is torch.Size([1, 768]). size mismatch for pretrained.patch_embed.proj.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 14, 14]). size mismatch for pretrained.patch_embed.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.0.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.0.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.0.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.0.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.1.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.1.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.1.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.1.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.2.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.2.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.2.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.2.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.2.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.2.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.2.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.3.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.3.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.3.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.3.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.3.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.3.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.3.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.4.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.4.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.4.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.4.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.4.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.4.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.4.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.5.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.5.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.5.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.5.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.5.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.5.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.5.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.6.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.6.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.6.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.6.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.6.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.6.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.6.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.7.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.7.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.7.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.7.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.7.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.7.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.7.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.8.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.8.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.8.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.8.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.8.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.8.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.8.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.9.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.9.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.9.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.9.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.9.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.9.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.9.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.10.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.10.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.10.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.10.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.10.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.10.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.10.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]). size mismatch for pretrained.blocks.11.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]). size mismatch for pretrained.blocks.11.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]). size mismatch for pretrained.blocks.11.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.ls1.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]). size mismatch for pretrained.blocks.11.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for pretrained.blocks.11.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]). size mismatch for pretrained.blocks.11.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.blocks.11.ls2.gamma: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for pretrained.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for depth_head.projects.0.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([96, 768, 1, 1]). size mismatch for depth_head.projects.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for depth_head.projects.1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([192, 768, 1, 1]). size mismatch for depth_head.projects.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for depth_head.projects.2.weight: copying a param with shape torch.Size([1024, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 768, 1, 1]). size mismatch for depth_head.projects.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for depth_head.projects.3.weight: copying a param with shape torch.Size([1024, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([768, 768, 1, 1]). size mismatch for depth_head.projects.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for depth_head.resize_layers.0.weight: copying a param with shape torch.Size([256, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([96, 96, 4, 4]). size mismatch for depth_head.resize_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([96]). size mismatch for depth_head.resize_layers.1.weight: copying a param with shape torch.Size([512, 512, 2, 2]) from checkpoint, the shape in current model is torch.Size([192, 192, 2, 2]). size mismatch for depth_head.resize_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for depth_head.resize_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([768, 768, 3, 3]). size mismatch for depth_head.resize_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for depth_head.scratch.layer1_rn.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 96, 3, 3]). size mismatch for depth_head.scratch.layer2_rn.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 192, 3, 3]). size mismatch for depth_head.scratch.layer3_rn.weight: copying a param with shape torch.Size([256, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 384, 3, 3]). size mismatch for depth_head.scratch.layer4_rn.weight: copying a param with shape torch.Size([256, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 768, 3, 3]). size mismatch for depth_head.scratch.refinenet1.out_conv.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 1, 1]). size mismatch for depth_head.scratch.refinenet1.out_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet1.resConfUnit1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet1.resConfUnit1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet1.resConfUnit1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet1.resConfUnit1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet1.resConfUnit2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet1.resConfUnit2.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet1.resConfUnit2.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet1.resConfUnit2.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet2.out_conv.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 1, 1]). size mismatch for depth_head.scratch.refinenet2.out_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet2.resConfUnit1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet2.resConfUnit1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet2.resConfUnit1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet2.resConfUnit1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet2.resConfUnit2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet2.resConfUnit2.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet2.resConfUnit2.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet2.resConfUnit2.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet3.out_conv.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 1, 1]). size mismatch for depth_head.scratch.refinenet3.out_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet3.resConfUnit1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet3.resConfUnit1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet3.resConfUnit1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet3.resConfUnit1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet3.resConfUnit2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet3.resConfUnit2.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet3.resConfUnit2.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet3.resConfUnit2.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet4.out_conv.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 1, 1]). size mismatch for depth_head.scratch.refinenet4.out_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet4.resConfUnit1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet4.resConfUnit1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet4.resConfUnit1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet4.resConfUnit1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet4.resConfUnit2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet4.resConfUnit2.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.refinenet4.resConfUnit2.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for depth_head.scratch.refinenet4.resConfUnit2.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for depth_head.scratch.output_conv1.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 128, 3, 3]). size mismatch for depth_head.scratch.output_conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for depth_head.scratch.output_conv2.0.weight: copying a param with shape torch.Size([32, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 64, 3, 3]).

a tip for small memory computer failed run export_to_onnx.py when encoder = 'vitl'

my computer's memory is 16gb. When I run python export_to_onnx.py (encoder = 'vitl'), the command line output like this below,

username@username:~/depth-anything-tensorrt$ python export_to_onnx.py 
xFormers not available
xFormers not available
Total parameters: 335.32M
/home/username/depth-anything-tensorrt/torchhub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:73: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert H % patch_H == 0, f"Input image height {H} is not a multiple of patch height {patch_H}"
/home/username/depth-anything-tensorrt/torchhub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:74: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert W % patch_W == 0, f"Input image width {W} is not a multiple of patch width: {patch_W}"
/home/username/depth-anything-tensorrt/torchhub/facebookresearch_dinov2_main/vision_transformer.py:183: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if npatch == N and w == h:
/home/username/depth-anything-tensorrt/depth_anything/dpt.py:131: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  out = F.interpolate(out, (int(patch_h * 14), int(patch_w * 14)), mode="bilinear", align_corners=True)
Killed

When I checked the resource usage of my computer while the code was running, I found that the memory was full.
So, if your computer have a nvidia gpu, you can try this:

import os
import torch
import torch.onnx

from depth_anything.dpt import DPT_DINOv2
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet

encoder = 'vitl'
#Download from https://huggingface.co/spaces/LiheYoung/Depth-Anything/tree/main/checkpoints
load_from = './checkpoints/depth_anything_vitl14.pth'
image_shape = (3, 518, 518)

# Initializing model
assert encoder in ['vits', 'vitb', 'vitl']
if encoder == 'vits':
    depth_anything = DPT_DINOv2(encoder='vits', features=64, out_channels=[48, 96, 192, 384], localhub='localhub')
elif encoder == 'vitb':
    depth_anything = DPT_DINOv2(encoder='vitb', features=128, out_channels=[96, 192, 384, 768], localhub='localhub')
else:
    depth_anything = DPT_DINOv2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024], localhub='localhub')

total_params = sum(param.numel() for param in depth_anything.parameters())
print('Total parameters: {:.2f}M'.format(total_params / 1e6))

# Loading model weight
depth_anything = depth_anything.to('cuda')

depth_anything.load_state_dict(torch.load(load_from, map_location='cpu'), strict=True)

depth_anything.eval()

# Define dummy input data
dummy_input = torch.ones(image_shape).unsqueeze(0)

dummy_input = dummy_input.to('cuda')

# Provide an example input to the model, this is necessary for exporting to ONNX
example_output = depth_anything(dummy_input)

onnx_path = load_from.split('/')[-1].split('.pth')[0] + '.onnx'

# Export the PyTorch model to ONNX format

torch.onnx.export(depth_anything, dummy_input, onnx_path, opset_version=11, input_names=["input"], output_names=["output"], verbose=True)

print(f"Model exported to {onnx_path}")

compared to the source code, I added the following two lines:

depth_anything = depth_anything.to('cuda')
dummy_input = dummy_input.to('cuda')

cannot export to onnx

python export_to_onnx.py

Total parameters: 97.47M
Traceback (most recent call last):
  File "/ComfyUI/Depth-Anything/export_to_onnx.py", line 33, in <module>
    example_output = depth_anything(dummy_input)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/depth_anything/dpt.py", line 156, in forward
    features = self.pretrained.get_intermediate_layers(x, 4, return_class_token=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 308, in get_intermediate_layers
    outputs = self._get_intermediate_layers_not_chunked(x, n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 277, in _get_intermediate_layers_not_chunked
    x = blk(x)
        ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/block.py", line 247, in forward
    return super().forward(x_or_x_list)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/block.py", line 105, in forward
    x = x + attn_residual_func(x)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/block.py", line 84, in attn_residual_func
    return self.ls1(self.attn(self.norm1(x)))
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ComfyUI/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/attention.py", line 76, in forward
    x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
    return _fMHA.apply(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
    op = _dispatch_fw(inp, True)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    return _run_priority_list(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 1370, 12, 64) (torch.float32)
     key         : shape=(1, 1370, 12, 64) (torch.float32)
     value       : shape=(1, 1370, 12, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`[email protected]` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    Only work on pre-MLIR triton for now
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    device=cpu (supported: {'cuda'})
    unsupported embed per head: 64

Is linux supported now？

Fail to create engine file with metric_depth_vits model

Hello! Thank you for your kind words!
However, I've encountered a problem. I exported the depth_anything_metric_depth_outdoor_vits.onnx file and executed the following command to create an engine file.

./depth-anything-tensorrt depth_anything_metric_depth_outdoor_vits.onnx testvideo.mp4

I met following error:
Loading model from depth_anything_metric_depth_outdoor_vits.onnx...
CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
/core/core/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,392,518] and [1,1,392,518].
ModelImporter.cpp:771: While parsing node number 924 [If -> "/core/core/If_output_0"]:
ModelImporter.cpp:772: --- Begin node ---
ModelImporter.cpp:773: input: "/core/core/Equal_1_output_0"
output: "/core/core/If_output_0"
name: "/core/core/If"
op_type: "If"
attribute {
name: "then_branch"
g {
node {
input: "/core/core/Relu_output_0"
output: "/core/core/Squeeze_output_0"
name: "/core/core/Squeeze"
op_type: "Squeeze"
attribute {
name: "axes"
ints: 1
type: INTS
}
doc_string: ""
}
name: "torch_jit1"
output {
name: "/core/core/Squeeze_output_0"
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_value: 1
}
dim {
dim_value: 392
}
dim {
dim_value: 518
}
}
}
}
}
}
type: GRAPH
}
attribute {
name: "else_branch"
g {
node {
input: "/core/core/Relu_output_0"
output: "/core/core/Identity_output_0"
name: "/core/core/Identity"
op_type: "Identity"
doc_string: ""
}
name: "torch_jit2"
output {
name: "/core/core/Identity_output_0"
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_value: 1
}
dim {
dim_value: 1
}
dim {
dim_value: 392
}
dim {
dim_value: 518
}
}
}
}
}
}
type: GRAPH
}
doc_string: "/home/hsu/Desktop/Depth-Anything/metric_depth/zoedepth/models/base_models/dpt_dinov2/dpt.py(158): forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/hsu/Desktop/Depth-Anything/metric_depth/zoedepth/models/base_models/depth_anything.py(273): forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/hsu/Desktop/Depth-Anything/metric_depth/zoedepth/models/zoedepth/zoedepth_v1.py(149): forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1182): _slow_forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/jit/_trace.py(118): wrapper\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/jit/_trace.py(127): forward\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/nn/modules/module.py(1194): _call_impl\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/jit/_trace.py(1184): _get_trace_graph\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/onnx/utils.py(891): _trace_and_get_graph_from_model\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/onnx/utils.py(987): _create_jit_graph\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/onnx/utils.py(1111): _model_to_graph\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/onnx/utils.py(1529): _export\n/home/hsu/anaconda3/envs/metric_depth/lib/python3.9/site-packages/torch/onnx/utils.py(504): export\n/home/hsu/Desktop/Depth-Anything/metric_depth/export_to_onnx.py(93): \n"

ModelImporter.cpp:774: --- End node ---
ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph:
[6] Invalid Node - /core/core/If
/core/core/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [1,392,518] and [1,1,392,518].
4: [network.cpp::validate::2882] Error Code 4: Internal Error (Network must have at least one output)
Segmentation fault (core dumped)

About on Jetson orin

Thank you very much for your work. I successfully obtained ONNX on the Jetson Orin platform according to your process. However, when I used this ONNX for depth estimation, I encountered the error "Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64." Attempting to cast down to INT32
Onnx2tr_utils. cpp: 403: One or more weights outside the range of INT32 was clamped
Tactical Device request: 19043MB Available: 15974MB Device memory is ineffective to use tactic
Skipping tactic 3 due to insufficient memory on requested size of 19043 detected for tactic 0x0000000 4
Try decreasing the workspace size with BuilderConfiguration:: setMemoryPoolLimit() ”
May I ask what I should do?

Can't reach higher fps

Hello, I am using C++ environment to run tensorrt of depth anything, however, I got Time of per frame: 13ms(After convergence) by using following command:
./depth-anything-tensorrt depth_anything_vits14_518x518.engine testvideo.mp4

my computer hardware is NVIDIA RTX4090, CUDA Version: 11.6, TensorRT-8.6.0.12. And there is no any error happened during building and inference.
Do you have any suggestion to resolve this problem as I am seeing a 13ms vs. the reported 3ms in this repo.

Thank you

Depth-Anything TensorRT Python

Hi, I have just implemented TensorRT inference in Python based on this project and the Depth-Anything project, and by modifying the code in the Depth-Anything, I have also realized a TensorRT inference Gradio demo. May I submit these modifications as a PR to this project?

Cuda Runtime (an illegal memory access was encountered)

Hello,

I face the following error when I run it:

1: [reformat.cu::lambda [](dim3, dim3)->auto::operator()::1685] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::enqueueInternal::795, condition: bindings[x] || nullBindingOK

Thanks!

-Scott

How to export onnx with dynamic shape?

Unable to create engine from onnx model

I've prepared the baseline onnx model, however, whenever I try and run depth-anything-tensorrt.exe, it outputs:

Loading model from models/depth_anything_vitb14.onnx...

and then, the program exits without any errors at all.

This is the command I'm using:

build/Release/depth-anything-tensorrt.exe models/depth_anything_vitb14.onnx video/davis_dolphins.mp4

I'm assuming that I am exporting the onnx model properly, however, just in case, I'll upload a log. Here's the command I used

python export.py --encoder vitb --load_from depth_anything_vitb14.pth --image_shape 3 518 518

onnxmodelexportlog.txt

Anyone have any idea why this isn't working for me?
Thanks.

CUDA: 11.6
TensortRT: 8.6.1.6
Windows: 11

Installation Guide

Installation

Download the pretrained model and install Depth-Anything:

git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt

Copy and paste dpt.py in this repo to <depth_anything_installpath>/depth_anything folder. Note that I've only removed a squeeze operation at the end of model's forward function in dpt.py to avoid conflicts with TensorRT.
Export the model to onnx format using export_to_onnx.py, you will get an onnx file named depth_anything_vit{}14.onnx, such as depth_anything_vitb14.onnx.
Install TensorRT using TensorRT official guidance.
Click here for Windows guide
1. Download the TensorRT zip file that matches the Windows version you are using.
2. Choose where you want to install TensorRT. The zip file will install everything into a subdirectory called TensorRT-8.x.x.x. This new subdirectory will be referred to as <installpath> in the steps below.
3. Unzip the TensorRT-8.x.x.x.Windows10.x86_64.cuda-x.x.zip file to the location that you chose. Where:
- 8.x.x.x is your TensorRT version
- cuda-x.x is CUDA version 11.6, 11.8 or 12.0
1. Add the TensorRT library files to your system PATH. To do so, copy the DLL files from <installpath>/lib to your CUDA installation directory, for example, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin, where vX.Y is your CUDA version. The CUDA installer should have already added the CUDA path to your system PATH.
Click here for installing tensorrt on Linux.

Find trtexec and then export onnx to engine.

trtexec --onnx=depth_anything_vitb14.onnx --saveEngine=depth_anything_vitb14.engine

Add --fp16 if you want to enable fp16 precision

trtexec --onnx=depth_anything_vitb14.onnx --saveEngine=depth_anything_vitb14.engine --fp16

Download and install any recent OpenCV for Windows.

Modify TensorRT and OpenCV paths in CMakelists.txt:

# Find and include OpenCV
set(OpenCV_DIR "your path to OpenCV")
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})

# Set TensorRT path if not set in environment variables
set(TENSORRT_DIR "your path to TensorRT")

Build project by using the following commands or cmake-gui(Windows).

Windows:

 mkdir build
cd build
cmake ..
cmake --build . --config Release

Linux(not tested):

mkdir build
cd build && mkdir out_dir
cmake ..
make

Tested Environment

TensorRT 8.6
CUDA 11.6
Windows 10

BUG

这里depth_data 在新建数组时，需要input_h及input_w 为整形，否则会报错；

export works for vitb, but not for vitl

Hi, and thank you for making this available!

exporting using the depth_anything_vitl14.pth model gives me this error:

size mismatch for depth_head.scratch.output_conv2.0.weight: copying a param with shape torch.Size([32, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 64, 3, 3]).

depth_anything_vitb14.pth works perfectly!

Also:
Can we export a pointcloud / depth mesh from this repo?
Do you support the metric depth models?
Can I swap to greyscale depth maps?

Thanks again!

Fail to create engine

I have exported the onnx file by export_to_onnx.py. However, I encountered errors by executing:
trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine

Here is the log:
&&&& RUNNING TensorRT.trtexec [TensorRT v8003] # trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine
[01/29/2024-18:01:10] [I] === Model Options ===
[01/29/2024-18:01:10] [I] Format: ONNX
[01/29/2024-18:01:10] [I] Model: depth_anything_vits14.onnx
[01/29/2024-18:01:10] [I] Output:
[01/29/2024-18:01:10] [I] === Build Options ===
[01/29/2024-18:01:10] [I] Max batch: explicit
[01/29/2024-18:01:10] [I] Workspace: 16 MiB
[01/29/2024-18:01:10] [I] minTiming: 1
[01/29/2024-18:01:10] [I] avgTiming: 8
[01/29/2024-18:01:10] [I] Precision: FP32
[01/29/2024-18:01:10] [I] Calibration:
[01/29/2024-18:01:10] [I] Refit: Disabled
[01/29/2024-18:01:10] [I] Sparsity: Disabled
[01/29/2024-18:01:10] [I] Safe mode: Disabled
[01/29/2024-18:01:10] [I] Restricted mode: Disabled
[01/29/2024-18:01:10] [I] Save engine: depth_anything_vits14.engine
[01/29/2024-18:01:10] [I] Load engine:
[01/29/2024-18:01:10] [I] NVTX verbosity: 0
[01/29/2024-18:01:10] [I] Tactic sources: Using default tactic sources
[01/29/2024-18:01:10] [I] timingCacheMode: local
[01/29/2024-18:01:10] [I] timingCacheFile:
[01/29/2024-18:01:10] [I] Input(s)s format: fp32:CHW
[01/29/2024-18:01:10] [I] Output(s)s format: fp32:CHW
[01/29/2024-18:01:10] [I] Input build shapes: model
[01/29/2024-18:01:10] [I] Input calibration shapes: model
[01/29/2024-18:01:10] [I] === System Options ===
[01/29/2024-18:01:10] [I] Device: 0
[01/29/2024-18:01:10] [I] DLACore:
[01/29/2024-18:01:10] [I] Plugins:
[01/29/2024-18:01:10] [I] === Inference Options ===
[01/29/2024-18:01:10] [I] Batch: Explicit
[01/29/2024-18:01:10] [I] Input inference shapes: model
[01/29/2024-18:01:10] [I] Iterations: 10
[01/29/2024-18:01:10] [I] Duration: 3s (+ 200ms warm up)
[01/29/2024-18:01:10] [I] Sleep time: 0ms
[01/29/2024-18:01:10] [I] Streams: 1
[01/29/2024-18:01:10] [I] ExposeDMA: Disabled
[01/29/2024-18:01:10] [I] Data transfers: Enabled
[01/29/2024-18:01:10] [I] Spin-wait: Disabled
[01/29/2024-18:01:10] [I] Multithreading: Disabled
[01/29/2024-18:01:10] [I] CUDA Graph: Disabled
[01/29/2024-18:01:10] [I] Separate profiling: Disabled
[01/29/2024-18:01:10] [I] Time Deserialize: Disabled
[01/29/2024-18:01:10] [I] Time Refit: Disabled
[01/29/2024-18:01:10] [I] Skip inference: Disabled
[01/29/2024-18:01:10] [I] Inputs:
[01/29/2024-18:01:10] [I] === Reporting Options ===
[01/29/2024-18:01:10] [I] Verbose: Disabled
[01/29/2024-18:01:10] [I] Averages: 10 inferences
[01/29/2024-18:01:10] [I] Percentile: 99
[01/29/2024-18:01:10] [I] Dump refittable layers:Disabled
[01/29/2024-18:01:10] [I] Dump output: Disabled
[01/29/2024-18:01:10] [I] Profile: Disabled
[01/29/2024-18:01:10] [I] Export timing to JSON file:
[01/29/2024-18:01:10] [I] Export output to JSON file:
[01/29/2024-18:01:10] [I] Export profile to JSON file:
[01/29/2024-18:01:10] [I]
[01/29/2024-18:01:10] [I] === Device Information ===
[01/29/2024-18:01:10] [I] Selected Device: NVIDIA GeForce RTX 4060 Laptop GPU
[01/29/2024-18:01:10] [I] Compute Capability: 8.9
[01/29/2024-18:01:10] [I] SMs: 24
[01/29/2024-18:01:10] [I] Compute Clock Rate: 2.25 GHz
[01/29/2024-18:01:10] [I] Device Global Memory: 7931 MiB
[01/29/2024-18:01:10] [I] Shared Memory per SM: 100 KiB
[01/29/2024-18:01:10] [I] Memory Bus Width: 128 bits (ECC disabled)
[01/29/2024-18:01:10] [I] Memory Clock Rate: 8.001 GHz
[01/29/2024-18:01:10] [I]
[01/29/2024-18:01:10] [I] TensorRT version: 8003
[01/29/2024-18:01:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +837, GPU +0, now: CPU 844, GPU 624 (MiB)
[01/29/2024-18:01:10] [I] Start parsing network model
[01/29/2024-18:01:10] [I] [TRT] ----------------------------------------------------------------
[01/29/2024-18:01:10] [I] [TRT] Input filename: depth_anything_vits14.onnx
[01/29/2024-18:01:10] [I] [TRT] ONNX IR version: 0.0.6
[01/29/2024-18:01:10] [I] [TRT] Opset version: 11
[01/29/2024-18:01:10] [I] [TRT] Producer name: pytorch
[01/29/2024-18:01:10] [I] [TRT] Producer version: 1.12.1
[01/29/2024-18:01:10] [I] [TRT] Domain:
[01/29/2024-18:01:10] [I] [TRT] Model version: 0
[01/29/2024-18:01:10] [I] [TRT] Doc string:
[01/29/2024-18:01:10] [I] [TRT] ----------------------------------------------------------------
[01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[01/29/2024-18:01:11] [W] [TRT] Output type must be INT32 for shape outputs
[01/29/2024-18:01:11] [I] Finish parsing network model
[01/29/2024-18:01:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 944, GPU 624 (MiB)
[01/29/2024-18:01:11] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 944 MiB, GPU 624 MiB
[01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1140, GPU +278, now: CPU 2085, GPU 902 (MiB)
[01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +973, GPU +194, now: CPU 3058, GPU 1096 (MiB)
[01/29/2024-18:01:12] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[01/29/2024-18:01:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 4351, GPU 1476 (MiB)
[01/29/2024-18:01:13] [E] Error[1]: [caskUtils.cpp::trtSmToCask::114] Error Code 1: Internal Error (Unsupported SM: 0x809)
[01/29/2024-18:01:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble