nvidia-ai-iot / lidar_ai_solution Goto Github PK

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).

License: Other

CMake 0.27% Python 21.86% C++ 52.03% Cuda 11.86% Shell 1.10% C 12.27% Makefile 0.61%

lidar_ai_solution's Introduction

Lidar AI Solution

This is a highly optimized solution for self-driving 3D-lidar repository. It does a great job of speeding up sparse convolution/CenterPoint/BEVFusion/OSD/Conversion.

Pipeline overview

GetStart

$ git clone --recursive https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution
$ cd Lidar_AI_Solution

For each specific task please refer to the readme in the sub-folder.

3D Sparse Convolution

A tiny inference engine for 3d sparse convolutional networks using int8/fp16.

Tiny Engine: Tiny Lidar-Backbone inference engine independent of TensorRT.
Flexible: Build execution graph from ONNX.
Easy To Use: Simple interface and onnx export solution.
High Fidelity: Low accuracy drop on nuScenes validation.
Low Memory: 422MB@SCN FP16, 426MB@SCN INT8.
Compact: Based on the CUDA kernels and independent of cutlass.

CUDA BEVFusion

CUDA & TensorRT solution for BEVFusion inference, including:

Camera Encoder: ResNet50 and finetuned BEV pooling with TensorRT and onnx export solution.
Lidar Encoder: Tiny Lidar-Backbone inference independent of TensorRT and onnx export solution.
Feature Fusion: Camera & Lidar feature fuser with TensorRT and onnx export solution.
Pre/Postprocess: Interval precomputing, lidar voxelization, feature decoder with CUDA kernels.
Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.
PTQ: Quantization solutions for mmdet3d/spconv, Easy to understand.

CUDA CenterPoint

CUDA & TensorRT solution for CenterPoint inference, including:

Preprocess: Voxelization with CUDA kernel
Encoder: 3D backbone with NV spconv-scn and onnx export solution.
Neck & Header: RPN & CenterHead with TensorRT and onnx export solution.
Postprocess: Decode & NMS with CUDA kernel
Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.
QAT: Quantization solutions for traveller59/spconv, Easy to understand.

CUDA PointPillars

CUDA & TensorRT solution for pointpillars inference, including:

Preprocess: Voxelization & Feature Extending with CUDA kernel
Detector: 2.5D backbone with TensorRT and onnx export solution.
Postprocess: Parse bounding box, class type and direction
Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.

CUDA-V2XFusion

Training and inference solutions for V2XFusion.

Easy To Use: Provides easily reproducible solutions for training, quantization, and ONNX export.
Quantification friendly：PointPillars based backbone with pre-normalization which can reduce quantization error.
Feature Fusion: Camera & Lidar feature fuser and onnx export solution.
PTQ: Quantization solutions for V2XFusion, easy to understand.
Sparsity: 4:2 structural sparsity support.
Deepstream sample: Sample inference using CUDA, TensorRT/Triton in NVIDIA DeepStream SDK 7.0.

cuOSD(CUDA On-Screen Display Library)

Draw all elements using a single CUDA kernel.

Line: Plotting lines by interpolation(Nearest or Linear).
RotateBox: Supports drawn with different border colors and fill colors.
Circle: Supports drawn with different border colors and fill colors.
Rectangle: Supports drawn with different border colors and fill colors.
Text: Supports stb_truetype and pango-cairo backends, allowing fonts to be read via TTF or using font-family.
Arrow: Combination of arrows by 3 lines.
Point: Plotting points by interpolation(Nearest or Linear).
Clock: Time plotting based on text support

cuPCL(CUDA Point Cloud Library)

Provide several GPU accelerated Point Cloud operations with high accuracy and high performance at the same time: cuICP, cuFilter, cuSegmentation, cuOctree, cuCluster, cuNDT, Voxelization(incoming).

cuICP: CUDA accelerated iterative corresponding point vertex cloud(point-to-point) registration implementation.
cuFilter: Support CUDA accelerated features: PassThrough and VoxelGrid.
cuSegmentation: Support CUDA accelerated features: RandomSampleConsensus with a plane model.
cuOctree: Support CUDA accelerated features: Approximate Nearest Search and Radius Search.
cuCluster: Support CUDA accelerated features: Cluster based on the distance among points.
cuNDT: CUDA accelerated 3D Normal Distribution Transform registration implementation for point cloud data.

YUVToRGB(CUDA Conversion)

YUV to RGB conversion. Combine Resize/Padding/Conversion/Normalization into a single kernel function.

Most of the time, it can be bit-aligned with OpenCV.
- It will give an exact result when the scaling factor is a rational number.
- Better performance is usually achieved when the stride can divide by 4.
Supported Input Format:
- NV12BlockLinear
- NV12PitchLinear
- YUV422Packed_YUYV
Supported Interpolation methods:
- Nearest
- Bilinear
Supported Output Data Type:
- Uint8
- Float32
- Float16
Supported Output Layout:
- CHW_RGB/BGR
- HWC_RGB/BGR
- CHW16/32/4/RGB/BGR for DLA input
Supported Features:
- Resize
- Padding
- Conversion
- Normalization

Thanks

This project makes use of a number of awesome open source libraries, including:

stb_image for PNG and JPEG support
pybind11 for seamless C++ / Python interop
and others! See the dependencies folder.

Many thanks to the authors of these brilliant projects!

lidar_ai_solution's People

Contributors

Stargazers

Watchers

Forkers

mandylove1993 byte-deve chaomath shituo123456 iloveai8086 saya0902 sophia-cai freedomlix jlqzzz gzptez0514 datouready chengwei920412 xiaoxifuhongse daxiongpro ywfwyht wcshen sainttelant feigechuanshu xinsuinizhuan dmame lllzd wanglaotou cvdong uyolo-cn lix19937 rqbrother haorand zjli2013 wep21 yangzg216 peeta586 codingonion concerttttt deepbehavier dl19940602 xiaoyuercv aromaticj blazeccc qiqzhang dachaoxc chybhao666 hzm8341 ai-jie01 leedonus superyanzhuang cnxqscn jinshubai lexcaliburr zhengfangwu cyberflamego enginbozkurt hufeihu tshiamor irmuun20 xuelian-yang liuxinren456852 kinggreat24 aimicm eralien w111liang222 collector-m wangxj2014 cvbuff helloworld77 githubwanxj verages winginwind vikytel af-74413592 perfy1127 geauxeric virtualxqf tream733 eagle-chase waitingdeng liujingbecky jonygu yeyang1021 justttry guoxs hackerliang liukang1811 yaodix stu-z brqiankun scott198512 xuanjiawang hailuo0112 ezhangle s95huang enesbatiny hongbo123467 suanli dys564843131 liufqing duanchengwen zsffuture zhouleidcc xueyingliu lch1238

lidar_ai_solution's Issues

Can other networks that use Spconv, such as VoxelRCNN, also use NV spconv scn for TensorRT acceleration？

Can other networks that use Spconv, such as VoxelRCNN, also use NV spconv scn for TensorRT acceleration？
其他使用了Spconv的网络例如VoxelRCNN同样能够使用NV spconv-scn进行TensorRT加速吗？

bug in Lidar_AI_Solution/CUDA-BEVFusion/qat/test-mAP-for-cuda.py

I found that the "show" option in Lidar_AI_Solution/CUDA-BEVFusion/qat/test-mAP-for-cuda.py is not effective. Could you please advise me on how to use this project to run on the test set of nusencese and visualize the results? Are there any related codes available?

Segmentation fault (core dumped) when run tool/run.sh

export onnx error

感谢作者提供这么优秀的rpo，我在导出自己训练的BEVfusion模型在运行export-scn.py导出的lidar.backbone.xyz.onnx如图所示：

运行export-scn.py脚本控制台打印：

Segmentation fault (core dumped) when run tool/run.sh

I follow the readme,prepare the models and data,build engine,,but when run tool/run.sh,the follow error exist:

==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 0.44275 ms
[⏰ [NoSt] ImageNrom]: 5.44451 ms
[⏰ Lidar Backbone]: 3.04624 ms
[⏰ Camera Depth]: 0.03789 ms
[⏰ Camera Backbone]: 2.55898 ms
[⏰ Camera Bevpool]: 0.35421 ms
[⏰ VTransform]: 0.61219 ms
[⏰ Transfusion]: 1.63635 ms
[⏰ Head BoundingBox]: 3.45373 ms
Total: 11.700 ms

==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 0.44605 ms
[⏰ [NoSt] ImageNrom]: 5.41389 ms
[⏰ Lidar Backbone]: 3.03587 ms
[⏰ Camera Depth]: 0.03789 ms
[⏰ Camera Backbone]: 2.55693 ms
[⏰ Camera Bevpool]: 0.36045 ms
[⏰ VTransform]: 0.60826 ms
[⏰ Transfusion]: 1.62582 ms
[⏰ Head BoundingBox]: 3.45293 ms
Total: 11.678 ms

tool/run.sh: line 41: 2643 Segmentation fault (core dumped) ./build/bevfusion $DEBUG_DATA $DEBUG_MODEL $DEBUG_PRECISION

Thank for your reply!

Cuda failure: invalid configuration argument in file Lidar_AI_Solution/CUDA-CenterPoint/src/preprocess.cpp:106 error status: 9

GPU has cuda devices: 2
----device id: 0 info----
GPU : NVIDIA RTX A6000
Capbility: 8.6
Global memory: 48651MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)
----device id: 1 info----
GPU : NVIDIA RTX A6000
Capbility: 8.6
Global memory: 48685MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Total 2

<<<<<<<<<<<
load file: ../data/test/3615d82e7e8546fea5f181157f42e30b.bin
find points num: 6
Cuda failure: invalid configuration argument in file Lidar_AI_Solution/CUDA-CenterPoint/src/preprocess.cpp:106 error status: 9
Aborted (core dumped)

Hardware requirements for BEVFusion

I would like to try training the BEVFusion model. What are the hardware requirements.

disable quant layer

in Centerpoint example, why do you only disable the quantization for conv_input and the first sparse basic block for conv1? What about the rest convs?

SparseConvolution.cu

Thank you very much for your outstanding work.
I see that the implementation of SparseConvolution on the deployment side gives the .so file and I was wondering if its code（.cu） will be released.

centerpoint Inference time

The official test case shows that inference time/frame is 42ms，but I had test on my own orin device，inference time is 61ms per frame. I want to know is there any detail should be consider to enhance the inference speed.

Inference results issue

I have tried my own data with CUDA-BEVFusion and BEVFusion directly without training. The BEVFusion gives good infercence results. But CUDA-BEVFusion can not detect person,the vehicle detection is good. Any clue for this?
Best regards

What is the Jetpack version on Orin to run BEVFusion?

What is the Jetpack version on Orin to run BEVFusion? 5.1.0, or the latest 5.1.1?

centerpoint train

Thank you very much for your open source，Which project should I use to train my own data ，

SparseConvolution version

What is the spconv version used in this repo?

plugin for SparseConvolution

Hi:
I can see that there is custom onnx node "SparseConvolution" in Centerpoint middle encoder onnx file. However, I can't find its plugin or how this onnx is converted to tensorrt engine. Can you guide me to where to find it?

Question about customized dataset

Hi, thanks for sharing the great work and congrats to the achievements!

I have a question regarding to customized dataset:

This repo is mainly designed to implement the model inference highly effeciently, so if i want to follow your work, the first thing to do is to trainng customized dataset with the official BEVFusion repo within Pytorch, right?

onnx input voxel num

In the onnx generated, the input seems to be of static shape. However, in inferencing, the input voxel number will always change. Does the sparsee convolution inference not depend on the input shape of onnx?

load_tensor to load in_features.torch.fp16.tensor

I got an error when using load_tensor from funcs to load 3DSparseConvolution/workspace/centerpoint/in_features.torch.fp16.tensor,
it shows this error:

from funcs import load_tensor
b = load_tensor("/vol_slow/Lidar_AI_Solution/libraries/3DSparseConvolution/workspace/centerpoint/in_features.torch.fp16.tensor")
print(b)

When I print the datatype, it shows int32, is this an error?

在运行CUDA-BEVFusion的时候遇到这个错误，fatal error: stb_image.h: 没有那个文件或目录

center point onnx file

I don‘t find the script to convert pth to onnx file. If I want to use my own weights, what do I have to do? Keep same name and input and out shape like the onnx file that you provide ?

你好，请问报这个错如何解决

/usr/bin/ld: warning: libprotobuf.so.17, needed by /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::RegisterAllTypes(google::protobuf::Metadata const*, int)' /usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::io::CodedOutputStream::WriteVarint32SlowPath(unsigned int)'
/usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::io::CodedOutputStream::WriteVarint64SlowPath(unsigned long)' /usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormat::SerializeUnknownFields(google::protobuf::UnknownFieldSet const&, google::protobuf::io::CodedOutputStream*)'
/usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::ArenaImpl::AllocateAligned(unsigned long)' /usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::AssignDescriptors(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
/usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormat::SerializeUnknownFieldsToArray(google::protobuf::UnknownFieldSet const&, unsigned char*)' /usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::Message::ParseFromIstream(std::istream*)'
/usr/bin/ld: /home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to `google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void (*)(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&))'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/bevfusion.dir/build.make:678：bevfusion] 错误 1
make[1]: *** [CMakeFiles/Makefile2:111：CMakeFiles/bevfusion.dir/all] 错误 2
make: *** [Makefile:91：all] 错误 2

Generate tiny engine for spconv

Thanks for the great work. I already tested inference on example data. In issue #4 , you mentioned that you generated a tiny engine in a standalone way for spconv. Can you elaborate on how can I generate tiny engine if I make some changes to spconv code in cuda-bevfusion ?

rulebook

Does every sparseconvolution have to have rulebook?

执行bash tool/run.sh报错

在执行bash tool/run.sh的时候报错，请问该怎么解决？
fatal error: stb_image.h: 没有那个文件或目录
#include <stb_image.h>
^~~~~~~~~~~~~
compilation terminated.
CMakeFiles/bevfusion.dir/build.make:504: recipe for target 'CMakeFiles/bevfusion.dir/src/main.cpp.o' failed
make[2]: *** [CMakeFiles/bevfusion.dir/src/main.cpp.o] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/bevfusion.dir/all' failed
make[1]: *** [CMakeFiles/bevfusion.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

/libspconv.so: file format not recognized

CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: file format not recognized; treating as linker script

centerpoint engine setup failed

hello Lidar_AI_Solution team @hopef ,

type bash tool/build.trt.sh, error occurs as follows:
[05/24/2023-16:26:52] [E] Error[10]: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_56 + BatchNormalization_57 + Relu_58.)
[05/24/2023-16:26:52] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
[05/24/2023-16:26:52] [E] Engine could not be created from network
[05/24/2023-16:26:52] [E] Building engine failed
[05/24/2023-16:26:52] [E] Failed to create engine from model or file.
[05/24/2023-16:26:52] [E] Engine set up failed
rpn_centerhead_sim.8503.log

My environment is
CUDA 11.4
TensorRT 8.5.3.1
GPU: 3060
Ubuntu 20.04

The full log is attached as well.
It seems the error comes from the onnx file.
Please help figure it out.
Thanks in advance.

Precision of SparseConvolution

Hello,

I see in you PTQ.onnx file and export-scn.py, your precisions of the first three SparseConvolution are 'fp16', is this because there is too much loss of accuracy with precision 'int8' ?

Thanks

Version of Tensorrt

Hello

I see in the README of CUDA-CenterPoint, you use 'CUDA 11.4 + cuDNN 8.4.1 + TensorRT 8.4.12.5' at Tesla Platform, but I can't find the version 8.4.12.5 of Tensorrt. In addition, I see that you set CUDA_TOOLKIT_ROOT_DIR=/root/.kiwi/lib/cuda-11.8 and TENSORRT_ROOT=/root/.kiwi/lib/TensorRT-8.5.3.1-cuda11x in the CmakeLists of CUDA-CenterPoint. So should I follow the version in the CmakeLists or README?

Thanks

./tool/run.sh: 行 41: 22500 段错误 (核心已转储) ./build/bevfusion $DEBUG_DATA $DEBUG_MODEL $DEBUG_PRECISION

大佬能麻烦帮我看一下问题出现在哪里吗？还是这是我的显卡不支持吗？
ubuntu18.04
RTX3080ti

how to train model/resnet50int8/bevfusion_ptq.pth ?

I have train the code in bevfuion[@ db75150] ,using resnet50 config ,but when I export my pth to onnx, that is wrong.

spconv version

Hi:
In Centerpoint quantization, you used _conv_forward as the forward function, so I assume you used spconv version 2.3.0+?

您好，在运行CUDA-BEVfusion时,遇到了缺库的问题，但已经安装

(lssEnv) shijie@shijie-ThinkStation-K:~/Lidar_AI_Solution/CUDA-BEVFusion$ bash tool/run.sh

|| MODEL: resnet50int8
|| PRECISION: int8
|| DATA: example-data
|| USEPython: OFF
||
|| TensorRT: /home/shijie/TensorRT-8.6.1.6/lib
|| CUDA: /usr/local/cuda-11.6/
|| CUDNN: /usr/local/cuda-11.6/lib64

Try to get the current device SM
Current CUDA SM: 86
Configuration done!
-- Configuring done
-- Generating done
-- Build files have been written to: /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/build
[ 63%] Built target bevfusion_core
[ 68%] Linking CXX executable bevfusion
/usr/bin/ld: warning: libprotobuf.so.17, needed by /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so, not found (try using -rpath or -rpath-link)
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::RegisterAllTypes(google::protobuf::Metadata const*, int)' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::OnShutdownRun(void ()(void const), void const*)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::Arena::OnArenaAllocation(std::type_info const*, unsigned long) const' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::fixed_address_empty_string[abi:cxx11]'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::io::CodedOutputStream::WriteVarint64SlowPath(unsigned long)' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::DestroyMessage(void const*)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::ArenaImpl::AllocateAlignedAndAddCleanup(unsigned long, void (*)(void*))' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::ArenaImpl::AllocateAligned(unsigned long)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormatLite::UInt64Size(google::protobuf::RepeatedField<unsigned long> const&)' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteFloatArray(float const*, int, google::protobuf::io::CodedOutputStream*)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteDoubleArray(double const*, int, google::protobuf::io::CodedOutputStream*)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::WireFormatLite::Int32Size(google::protobuf::RepeatedField<int> const&)' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::internal::AssignDescriptors(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::Message::SpaceUsedLong() const' /home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to google::protobuf::io::CodedInputStream::SkipFallback(int, int)'
/home/shijie/Lidar_AI_Solution/CUDA-BEVFusion/../libraries/3DSparseConvolution/libspconv/lib/x86_64/libspconv.so: undefined reference to `google::protobuf::internal::WireFormatLite::Int64Size(google::protobuf::RepeatedField const&)'
collect2: error: ld returned 1 exit status
CMakeFiles/bevfusion.dir/build.make:651: recipe for target 'bevfusion' failed
make[2]: *** [bevfusion] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/bevfusion.dir/all' failed
make[1]: *** [CMakeFiles/bevfusion.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

(lssEnv) shijie@shijie-ThinkStation-K:~/Lidar_AI_Solution/CUDA-BEVFusion$ sudo apt install libprotobuf-dev
[sudo] password for shijie:
Reading package lists... Done
Building dependency tree
Reading state information... Done
libprotobuf-dev is already the newest version (3.0.0-9.1ubuntu1.1).
The following packages were automatically installed and are no longer required:
gir1.2-goa-1.0 gir1.2-snapd-1
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 103 not upgraded.

大佬可以看看怎么解决吗，我也尝试过从源码安装这个库

/home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/src/main.cpp:30:10: fatal error: stb_image.h: 没有那个文件或目录 30 | #include <stb_image.h> | ^~~~~~~~~~~~~ compilation terminated.

你好，报这个错如何解决
/home/chenyang/Lidar_AI_Solution_3/CUDA-BEVFusion/src/main.cpp:30:10: fatal error: stb_image.h: 没有那个文件或目录
30 | #include <stb_image.h>
| ^~~~~~~~~~~~~
compilation terminated.

Can not find any matched kernel 4 x 16

Hello， i have converted my centerpoint model to onnx. I only use 4 feature as input while use 5 in your code, then the error occurs.

Pass argument constructor
Assert failed 💀. false in file src/spconv/implicit-gemm.cu:346, message: Can not find any matched kernel 4 x 16

Is it my bug in my code or the project don't support other input shape?

libspconv for other gpu like 3090

Will you release libspconv for 3090 gpu since current libspconv supports only Tesla?

Jetson AGX Xavier infer Error

Lidar_AI_Solution/CUDA-BEVFusion/src/common/tensor.cu(135): error: calling a device function("__half") from a host function("arange_kernel_host") is not allowed

1 error detected in the compilation of "/tmp/tmpxft_00006d55_00000000-4_tensor.cpp4.ii".
CMake Error at bevfusion_core_generated_tensor.cu.o.Release.cmake:280 (message):
Error generating file
/work/share/usr/cyy/BEV/Lidar_AI_Solution/CUDA-BEVFusion/build/CMakeFiles/bevfusion_core.dir/src/common/./bevfusion_core_generated_tensor.cu.o

support of custom dataset

Hi:
Does your sparse convolution support inference with input shape N*4? When I'm trying to inference, it says

CUDA-BEVFusion build head.bbox model failed

Thanks for the great work.

In CUDA-BEVFusion, when I run "bash tool/build_trt_engine.sh" to build engine file, head.bbox model build failed, but have no error info.
Could you please help me check what the problem is?

你好，我在cuda-centerpoint程序，遇到Detection NUM= 0的情况

GPU has cuda devices: 1
----device id: 0 info----
  GPU : NVIDIA GeForce GTX 1080 Ti 
  Capbility: 6.1
  Global memory: 11156MB
  Const memory: 64KB
  SM in a block: 48KB
  warp size: 32
  threads in a block: 1024
  block dim: (1024,1024,64)
  grid dim: (2147483647,65535,65535)

Total 2

<<<<<<<<<<<
load file: ../data/test/291e7331922541cea98122b607d24831.bin
find points num: 239911
[TIME] Voxelization:            3.09219 ms
valid_num: 85179
[TIME] 3D Backbone:             0.35184 ms
[TIME] RPN + Head:              24.59392 ms
CUDA kernel failed : no kernel image is available for execution on the device
[TIME] Decode + NMS:            0.43117 ms
Detection NUM: 0
Saved prediction in: ../data/prediction/291e7331922541cea98122b607d24831.txt
>>>>>>>>>>>

<<<<<<<<<<<
load file: ../data/test/3615d82e7e8546fea5f181157f42e30b.bin
find points num: 267057
[TIME] Voxelization:            2.24326 ms
valid_num: 106004
[TIME] 3D Backbone:             0.20611 ms
[TIME] RPN + Head:              23.10170 ms
CUDA kernel failed : no kernel image is available for execution on the device
[TIME] Decode + NMS:            0.55421 ms
Detection NUM: 0
Saved prediction in: ../data/prediction/3615d82e7e8546fea5f181157f42e30b.txt
>>>>>>>>>>>

Perf Report: 
    Voxelization: 2.66773 ms.
    3D Backbone: 0.278976 ms.
    RPN + Head: 23.8478 ms.
    Decode + NMS: 0.492688 ms.
    Total: 27.2872 ms.

build.trt.sh code

trt_version=8406

if [ ! -f "model/rpn_centerhead_sim.plan.${trt_version}" ]; then
    echo Building the model: model/rpn_centerhead_sim.plan.${trt_version}, this will take 2 minutes. Wait a moment 🤗🤗🤗~.
    trtexec --onnx=model/rpn_centerhead_sim.onnx \
        --saveEngine=model/rpn_centerhead_sim.plan.${trt_version} \
        --workspace=4096 --fp16 --outputIOFormats=fp16:chw \
        --inputIOFormats=fp16:chw --verbose --dumpLayerInfo \
        --dumpProfile --separateProfileRun \
        --profilingVerbosity=detailed > model/rpn_centerhead_sim.${trt_version}.log 2>&1

    rm -rf model/rpn_centerhead_sim.plan
    dir=`pwd`
    ln -s ${dir}/model/rpn_centerhead_sim.plan.${trt_version} model/rpn_centerhead_sim.plan
else
    echo Model model/rpn_centerhead_sim.plan.${trt_version} already build 🙋🙋🙋.
fi

你好，请教一下，我按照readme流程进行操作的，tensort版本是8.4.0.6，build.trt.sh生成的log显示转换没有问题。
请问是bash tool/build.trt.sh 这一步中tensor的版本不对应造成的吗？
谢谢

How to set environment variables when I use pip install TensorRT?

Hi. I'm trying to use conda to run BEVFusion. I have installed tensorRT by pip and the function works. But I didn't find TensorRT_Lib and the other path under my conda env lib. Is there any way to solve this? THANK YOU.

大佬，编译报错

make[2]: *** [CMakeFiles/bevfusion_core.dir/build.make:112：CMakeFiles/bevfusion_core.dir/src/bevfusion/bevfusion_core_generated_camera-vtransform.cu.o] 错误 1
/usr/include/c++/11/type_traits:79:52: error: redefinition of ‘constexpr const _Tp std::integral_constant<_Tp, __v>::value’
79 | template<typename _Tp, _Tp __v>
| ^
/usr/include/c++/11/type_traits:67:29: note: ‘constexpr const _Tp value’ previously declared here
67 | static constexpr _Tp value = __v;
| ^~~~~
/usr/include/c++/11/type_traits:79:52: error: redefinition of ‘constexpr const _Tp std::integral_constant<_Tp, __v>::value’
79 | template<typename _Tp, _Tp __v>
| ^
/usr/include/c++/11/type_traits:67:29: note: ‘constexpr const _Tp value’ previously declared here
67 | static constexpr _Tp value = __v;
| ^~~~~
/usr/include/c++/11/type_traits:79:52: error: redefinition of ‘constexpr const _Tp std::integral_constant<_Tp, __v>::value’
79 | template<typename _Tp, _Tp __v>
| ^
/usr/include/c++/11/type_traits:67:29: note: ‘constexpr const _Tp value’ previously declared here
67 | static constexpr _Tp value = __v;
| ^~~~~
CMake Error at bevfusion_core_generated_lidar-voxelization.cu.o.Release.cmake:280 (message):
Error generating file
/home/chenyang/Lidar_AI_Solution_2/CUDA-BEVFusion/build/CMakeFiles/bevfusion_core.dir/src/bevfusion/./bevfusion_core_generated_lidar-voxelization.cu.o

BEVFUSION, PTX JIT compilation failed, code = cudaErrorInvalidPtx

root@fe3435b23fa0:/mnt/Lidar_AI_Solution-master/CUDA-BEVFusion# ./tool/run.sh 
==========================================================
||  MODEL: resnet50int8
||  PRECISION: int8
||  DATA: example-data
||  USEPython: OFF
||
||  TensorRT: /usr/lib/x86_64-linux-gnu/
||  CUDA: /usr/local/cuda
||  CUDNN: /usr/lib/x86_64-linux-gnu/
==========================================================
Try to get the current device SM
Current CUDA SM: 80
Configuration done!
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/Lidar_AI_Solution-master/CUDA-BEVFusion/build
Consolidate compiler generated dependencies of target bevfusion_core
[ 63%] Built target bevfusion_core
Consolidate compiler generated dependencies of target bevfusion
[ 68%] Building CXX object CMakeFiles/bevfusion.dir/src/main.cpp.o
[ 72%] Linking CXX executable bevfusion
[100%] Built target bevfusion
Create by resnet50int8, int8
CUDA Runtime error create_frustum_kernel # a PTX JIT compilation failed, code = cudaErrorInvalidPtx [ 218 ] in file /mnt/Lidar_AI_Solution-master/CUDA-BEVFusion/src/bevfusion/camera-geometry.cu:209
./tool/run.sh: line 41:   583 Aborted                 (core dumped) ./build/bevfusion $DEBUG_DATA $DEBUG_MODEL $DEBUG_PRECISION

how to solve this?

The version of protobuf linking by libspconv.so?

Whait is the version of protobuf linking by libspconv.so?

Why choose hist(mse) calib method in sparseconv quant ?

Good work,I find https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/blob/master/CUDA-CenterPoint/qat/tools/sparseconv_quantization.py use mse, do you try other calib methods ? Thanks.

Compilation of ".plan" file

Hi,

As you said in another issue, you implemented a tiny engine in a standalone way and build some ".plan" files from ONNX models. And I want know if these ".plan" model file can be compiled with other ".engine" file or ".trt" file together?

Thanks

How can I convert a lidar pcd file to points.tensor ?

Thanks for this great work. Based on cuda-bevfusion, I can run on orin with 45 ms time consumption per frame which is very impressive.
Now, I want to try with my own data. How can I convert a lidar pcd file to points.tensor ?

convert libtorch tensor to Tensor for spconv inference in c++

How do I directly convert a libtorch tensor to the tensor for spconv inference? Is there a function?

Cuda failure: invalid configuration argument in file Lidar_AI_Solution/CUDA-CenterPoint/src/preprocess.cpp:106 error status: 9

Total 2

core dumped by CUDA Runtime error spconv::setup_hash_and_route

(base) qiu@qiu-System-Product-Name:~/Lidar_AI_Solution/CUDA-BEVFusion$ bash tool/run.sh

|| MODEL: resnet50
|| PRECISION: fp16
|| DATA: example-data
|| USEPython: OFF
||
|| TensorRT: /usr/local/TensorRT-8.5.1.7/lib
|| CUDA: /usr/local/cuda
|| CUDNN: /usr/local//cuda/lib64

Try to get the current device SM
Current CUDA SM: 75
Configuration done!
-- Configuring done
-- Generating done
-- Build files have been written to: /home/qiu/Lidar_AI_Solution/CUDA-BEVFusion/build
[ 63%] Built target bevfusion_core
[100%] Built target bevfusion
Create by resnet50, fp16

Camerea Backbone 🌱 is Static Shape model
Inputs: 2
0.img : {1 x 6 x 3 x 256 x 704} [Float16]
1.depth : {1 x 6 x 1 x 256 x 704} [Float16]
Outputs: 2
0.camera_depth_weights : {6 x 118 x 32 x 88} [Float16]
1.camera_feature : {6 x 32 x 88 x 80} [Float16]

Camerea VTransform 🌱 is Static Shape model
Inputs: 1
0.feat_in : {1 x 80 x 360 x 360} [Float16]
Outputs: 1
0.feat_out : {1 x 80 x 180 x 180} [Float16]

Transfusion 🌱 is Static Shape model
Inputs: 2
0.camera : {1 x 80 x 180 x 180} [Float16]
1.lidar : {1 x 256 x 180 x 180} [Float16]
Outputs: 1
0.middle : {1 x 512 x 180 x 180} [Float16]

BBox 🌱 is Static Shape model
Inputs: 1
0.middle : {1 x 512 x 180 x 180} [Float16]
Outputs: 6
0.reg : {1 x 2 x 200} [Float16]
1.height : {1 x 1 x 200} [Float16]
2.dim : {1 x 3 x 200} [Float16]
3.rot : {1 x 2 x 200} [Float16]
4.vel : {1 x 2 x 200} [Float16]
5.score : {1 x 10 x 200} [Float16]

==================BEVFusion===================
[⏰ [NoSt] CopyLidar]: 1.13885 ms
[⏰ [NoSt] ImageNrom]: 12.50054 ms

CUDA Runtime error spconv::setup_hash_and_route<<<(num_input + 1023) / 1024, 1024, 0, stream>>>( hash_.get(), route_mask_.ptr(), route_.ptr(), indices, num_input, prob.kv) # no kernel image is available for execution on the device, code = cudaErrorNoKernelImageForDevice [ 209 ] in file src/spconv/rulebook.cu:138
tool/run.sh: line 41: 28104 Aborted (core dumped) ./build/bevfusion $DEBUG_DATA $DEBUG_MODEL $DEBUG_PRECISION

thanks for reply

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type "onnx2trt_onnx.ModelProto" has no field named "version".

thanks for your great work! I meet a question when i run "bash tool/build.trt.sh", the error is :
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:9: Message type "onnx2trt_onnx.ModelProto" has no field named "version".

My environment:
cuda11.7
tensorRT 8.5.3
ubuntu18
protobuf 3.6.1

Whether this project requires a higher version of protobuf？

nvidia-ai-iot / lidar_ai_solution Goto Github PK

lidar_ai_solution's Introduction

Lidar AI Solution

Pipeline overview

GetStart

3D Sparse Convolution

CUDA BEVFusion

CUDA CenterPoint

CUDA PointPillars

CUDA-V2XFusion

cuOSD(CUDA On-Screen Display Library)

cuPCL(CUDA Point Cloud Library)

YUVToRGB(CUDA Conversion)

Thanks

lidar_ai_solution's People

Contributors

Stargazers

Watchers

Forkers

lidar_ai_solution's Issues

I follow the readme,prepare the models and data,build engine,,but when run tool/run.sh,the follow error exist:

(lssEnv) shijie@shijie-ThinkStation-K:~/Lidar_AI_Solution/CUDA-BEVFusion$ bash tool/run.sh

|| MODEL: resnet50int8 || PRECISION: int8 || DATA: example-data || USEPython: OFF || || TensorRT: /home/shijie/TensorRT-8.6.1.6/lib || CUDA: /usr/local/cuda-11.6/ || CUDNN: /usr/local/cuda-11.6/lib64

(base) qiu@qiu-System-Product-Name:~/Lidar_AI_Solution/CUDA-BEVFusion$ bash tool/run.sh

|| MODEL: resnet50 || PRECISION: fp16 || DATA: example-data || USEPython: OFF || || TensorRT: /usr/local/TensorRT-8.5.1.7/lib || CUDA: /usr/local/cuda || CUDNN: /usr/local//cuda/lib64

Try to get the current device SM Current CUDA SM: 75 Configuration done! -- Configuring done -- Generating done -- Build files have been written to: /home/qiu/Lidar_AI_Solution/CUDA-BEVFusion/build [ 63%] Built target bevfusion_core [100%] Built target bevfusion Create by resnet50, fp16

Camerea Backbone 🌱 is Static Shape model Inputs: 2 0.img : {1 x 6 x 3 x 256 x 704} [Float16] 1.depth : {1 x 6 x 1 x 256 x 704} [Float16] Outputs: 2 0.camera_depth_weights : {6 x 118 x 32 x 88} [Float16] 1.camera_feature : {6 x 32 x 88 x 80} [Float16]

Camerea VTransform 🌱 is Static Shape model Inputs: 1 0.feat_in : {1 x 80 x 360 x 360} [Float16] Outputs: 1 0.feat_out : {1 x 80 x 180 x 180} [Float16]

Transfusion 🌱 is Static Shape model Inputs: 2 0.camera : {1 x 80 x 180 x 180} [Float16] 1.lidar : {1 x 256 x 180 x 180} [Float16] Outputs: 1 0.middle : {1 x 512 x 180 x 180} [Float16]

BBox 🌱 is Static Shape model Inputs: 1 0.middle : {1 x 512 x 180 x 180} [Float16] Outputs: 6 0.reg : {1 x 2 x 200} [Float16] 1.height : {1 x 1 x 200} [Float16] 2.dim : {1 x 3 x 200} [Float16] 3.rot : {1 x 2 x 200} [Float16] 4.vel : {1 x 2 x 200} [Float16] 5.score : {1 x 10 x 200} [Float16]

Recommend Projects

Recommend Topics

Recommend Org

Jobs

|| MODEL: resnet50int8
|| PRECISION: int8
|| DATA: example-data
|| USEPython: OFF
||
|| TensorRT: /home/shijie/TensorRT-8.6.1.6/lib
|| CUDA: /usr/local/cuda-11.6/
|| CUDNN: /usr/local/cuda-11.6/lib64

|| MODEL: resnet50
|| PRECISION: fp16
|| DATA: example-data
|| USEPython: OFF
||
|| TensorRT: /usr/local/TensorRT-8.5.1.7/lib
|| CUDA: /usr/local/cuda
|| CUDNN: /usr/local//cuda/lib64

Try to get the current device SM
Current CUDA SM: 75
Configuration done!
-- Configuring done
-- Generating done
-- Build files have been written to: /home/qiu/Lidar_AI_Solution/CUDA-BEVFusion/build
[ 63%] Built target bevfusion_core
[100%] Built target bevfusion
Create by resnet50, fp16

Camerea Backbone 🌱 is Static Shape model
Inputs: 2
0.img : {1 x 6 x 3 x 256 x 704} [Float16]
1.depth : {1 x 6 x 1 x 256 x 704} [Float16]
Outputs: 2
0.camera_depth_weights : {6 x 118 x 32 x 88} [Float16]
1.camera_feature : {6 x 32 x 88 x 80} [Float16]

Camerea VTransform 🌱 is Static Shape model
Inputs: 1
0.feat_in : {1 x 80 x 360 x 360} [Float16]
Outputs: 1
0.feat_out : {1 x 80 x 180 x 180} [Float16]

Transfusion 🌱 is Static Shape model
Inputs: 2
0.camera : {1 x 80 x 180 x 180} [Float16]
1.lidar : {1 x 256 x 180 x 180} [Float16]
Outputs: 1
0.middle : {1 x 512 x 180 x 180} [Float16]

BBox 🌱 is Static Shape model
Inputs: 1
0.middle : {1 x 512 x 180 x 180} [Float16]
Outputs: 6
0.reg : {1 x 2 x 200} [Float16]
1.height : {1 x 1 x 200} [Float16]
2.dim : {1 x 3 x 200} [Float16]
3.rot : {1 x 2 x 200} [Float16]
4.vel : {1 x 2 x 200} [Float16]
5.score : {1 x 10 x 200} [Float16]