GithubHelp home page GithubHelp logo

opendrivelab / vidar Goto Github PK

View Code? Open in Web Editor NEW
198.0 198.0 13.0 36.57 MB

[CVPR 2024 Highlight] Visual Point Cloud Forecasting

Home Page: https://arxiv.org/abs/2312.17655

License: Apache License 2.0

Python 92.18% Shell 0.27% C++ 0.96% Cuda 6.48% C 0.01% Dockerfile 0.09%
autonomous-driving point-cloud-forecasting pre-training world-model

vidar's People

Contributors

hli2020 avatar ilnehc avatar sephyli avatar tomztyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vidar's Issues

MMCV, RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.

Dear authors,

Thank you for your contribution!

I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script:
./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
it gives me the following error for MMCV package while I'm using cuda environment:

File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.

The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.

By following this link, I also try to install mmcv-full cuda version using the following command
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html, but it still gives me the same error.

Is there any way to solve this issue? Thanks!

Best regards

Regarding Table 1 in the paper

How is the Differentiable Ray-casting compared in Table 1 implemented, and is it the same as described in [1]?

[1] Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

About ADE, FDE, MR

Hello! I've been deeply impressed by your model.
I'm using it to check if the performance metrics such as ADE, FDE, and MR are available from the model for a paper's table. But I'm in trouble at mixed with UniAD and ViDAR. Could you please share the code for accessing these metrics like motion forecasting?
Thank you very much :)

Can Vidar produce 4D binary occupancy results during inference?

Thanks for your great work! I have doubt about the latent rendering during inference between yours vidar and 4d-occ-forecasting. They generate query pred_pcds from a binary 4D occupancy grid. Can Vidar produce 4D binary occupancy results during inference? Or Vidar can only generate future pointclouds?

How to use all the data to fine-tune?

I looked at the configuration parameters and there didn't seem to be any parameters to control the amount of data. How do I fine-tune with all the data sets?

`undefined symbol` ImportError with `from chamferdist import _C`

Thank you for your great work.

Issue

When I run ./dist_train.sh, I get the following error:

Traceback (most recent call last):                                                                                                                                                                                 
  File "./tools/train.py", line 263, in <module>    
    main()                                                                                                                                                                                                [35/1949]
  File "./tools/train.py", line 126, in main                                                                                                                                                                       
    plg_lib = importlib.import_module(_module_path)                                                                                                                                                                
  File "/root/miniconda3/envs/lib/python3.8/importlib/__init__.py", line 127, in import_module                                                                                                                     
    return _bootstrap._gcd_import(name[level:], package, level)                        
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load         
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked                                                                                                                                                
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                                                     
  File "/workspace/ViDAR/projects/mmdet3d_plugin/__init__.py", line 11, in <module>
    from .bevformer import *                                                                                                                                                                                       
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/__init__.py", line 2, in <module>
    from .dense_heads import *                                                                           
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/dense_heads/__init__.py", line 2, in <module>
    from .bev_head import BEVHead                                                                                                                                                                                  
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/dense_heads/bev_head.py", line 22, in <module> 
    from projects.mmdet3d_plugin.bevformer.modules import PerceptionTransformerBEVEncoder                                                                                                                          
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/__init__.py", line 10, in <module>
    from .vidar_decoder import (PredictionDecoder,                                                       
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/vidar_decoder.py", line 22, in <module>
    from .ray_operations import LatentRendering
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/ray_operations/__init__.py", line 1, in <module>
    from .latent_rendering import LatentRendering
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/ray_operations/latent_rendering.py", line 12, in <module>
    from ...utils import e2e_predictor_utils                                                             
  File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/utils/e2e_predictor_utils.py", line 163, in <module>
    from chamferdist import ChamferDistance
  File "/root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/__init__.py", line 1, in <module>
    from .chamfer import ChamferDistance
  File "/root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/chamfer.py", line 12, in <module>
    from chamferdist import _C
ImportError: /root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

Apparently from https://github.com/pytorch/pytorch/blob/302ee7bfb604ebef384602c56e3853efed262030/aten/src/ATen/core/TensorBase.h#L472

How to reproduce

I am trying to run your code in a docker container which is created from a Dockerfile as follows:

ARG CUDA_VERSION=11.3.1
ARG OS_VERSION=20.04
# pull a prebuilt image
FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}

SHELL ["/bin/bash", "-c"]

# Required to build Ubuntu 20.04 without user prompts with DLFW container
ENV DEBIAN_FRONTEND=noninteractive

# Install requried libraries
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update && apt-get install -y --no-install-recommends \
    libcurl4-openssl-dev \
    wget \
    zlib1g-dev \
    git \
    sudo \
    ssh \
    libssl-dev \
    pbzip2 \
    pv \
    bzip2 \
    unzip \
    devscripts \
    lintian \
    fakeroot \
    dh-make \
    build-essential \
    curl \
    ca-certificates \
    libx11-6 \
    nano \
    graphviz \
    libgl1-mesa-glx \
    openssh-server \
    apt-transport-https

# Install other dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgtk2.0-0 \
    libcanberra-gtk-module \
    libsm6 libxext6 libxrender-dev \
    libgtk2.0-dev pkg-config \
    libopenmpi-dev \
 && sudo rm -rf /var/lib/apt/lists/*

# Install Miniconda
RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm -f Miniconda3-latest-Linux-x86_64.sh 

ENV CONDA_DEFAULT_ENV=${project}
ENV CONDA_PREFIX=/root/miniconda3/envs/$CONDA_DEFAULT_ENV
ENV PATH=/root/miniconda3/bin:$CONDA_PREFIX/bin:$PATH

# install python 3.8
RUN conda install python=3.8
RUN alias python='/root/miniconda3/envs/bin/python3.8'

# Set environment and working directory
ENV CUDA_HOME=/usr/local/cuda
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDA_HOME/extras/CUPTI/lib64/:$LD_LIBRARY_PATH
ENV PATH=$CUDA_HOME/bin:$PATH
ENV CFLAGS="-I$CUDA_HOME/include $CFLAGS"
ENV FORCE_CUDA="1"
ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/miniconda3/envs/bin:$PATH

# install pytorch
RUN pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

# install opencv
RUN python -m pip install opencv-python==4.5.5.62

# install gcc
RUN conda install -c omgarcia gcc-6 -y

# install torchpack
RUN git clone https://github.com/zhijian-liu/torchpack.git
RUN cd torchpack && python -m pip install -e .

# install other dependencies
RUN python -m pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
RUN python -m pip install pillow==8.4.0 \
                          tqdm \
                          mmdet==2.14.0 \
                          mmsegmentation==0.14.1 \
                          numba \
                          mpi4py \
                          nuscenes-devkit \
                          setuptools==59.5.0

# install mmdetection3d from source
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"

RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/open-mmlab/mmdetection3d.git && \
    cd mmdetection3d && \
    git checkout v0.17.1 && \
    python -m pip install -r requirements/build.txt && \
    python -m pip install --no-cache-dir -e .

# install timm
RUN python -m pip install timm

# libraries path
RUN ln -s /usr/local/cuda/lib64/libcusolver.so.11 /usr/local/cuda/lib64/libcusolver.so.10

RUN pip install einops fvcore seaborn \
    iopath==0.1.9 \
    timm==0.6.13 \
    typing-extensions==4.5.0 \
    pylint \
    ipython==8.12 \
    numpy==1.19.5 \
    matplotlib==3.5.2 \
    numba==0.48.0 \
    pandas==1.4.4 \
    scikit-image==0.19.3 \
    setuptools==59.5.0
RUN python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

RUN mkdir /workspace && \
    chmod -R a+w /workspace && \
    cd /workspace

USER root
RUN ["/bin/bash"]

Inside the docker container, I setup the chamferdist package as written in the readme.

# python -c "import torch; print(torch.__version__)"
1.10.1+cu111

A simple question of sensor coordinate.

I'd like to ask what is the coordinate of multi-frame sensor data. I guess point clouds from past and future are transformed to the current lidar coordinate. and camera is transformed to current lidar coordinate with lidar2img. I think in your case, you do not use ego coordinate. Please point it out if I get wrong with the coordinate.

Requesting requirements.txt

Hello,

Could you please share your environment dependencies as a .txt file? Despite following your setup instructions, in the last step, I am encountering inconsistencies in the environment regarding numpy, scikit-learn, scikit-image etc.

Thanks

About the openscene_mini_train.pkl and val.pkl

When I used 1/8 of the mini data, the training data was 5108, but the github log printed is 621 (621*8=4968), and the val data length also not match (5554 vs 6462).

In addition, the evaluation indicators are also different, and the chamfer_distance_inner indicator is missing.

About the chamferdist

Can the chamferdist package be compiled on AMD graphics cards? My pip installation shows success, but it is not working properly.

GPU HOURS

What type of GPU and how many GPU hours that may need to reproduce your work?
Thank you.

Why can't I specify a GPU?

Sorry, this is a mistake of mine, I missed a space after the CUDA_VISIBLE_DEVICES='0,1,2,3', caused this mistake.

CONFIG=$1
GPUS=$2
PORT=${PORT:-28509}
CUDA_VISIBLE_DEVICES='0,1,2,3' \
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m debugpy --listen 5678 --wait-for-client \
-m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
    $(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3} --deterministic

In Eq5, BEV feature expectation function is 2 dim?? , but conditional probability is 3-dim. how to calculate final Feature for BEV??

I am writing to express my sincere appreciation for your excellent research. Your work has been incredibly insightful and has sparked my curiosity in several areas.

I have a question regarding Equation 5 in your paper. I noticed that Equations 3 and 4 are computed in a three-dimensional space (x, y, z). Similarly, I am curious about how Equation 5 is calculated. Given that the conditional probability is three-dimensional while the Bird's Eye View (BEV) feature is two-dimensional, I am assuming there must be a method to reduce the conditional probability to two dimensions. However, I could not clearly understand the calculation method from the supplementary materials. In the section below Equation 8, there is a mention of g(i) = {xi, yi}, which appears to be in two dimensions. Could you please clarify how this computation is achieved?

Additionally, I am finding it challenging to interpret the meaning of a statement related to Equation 6: "The ray-wise features are shared by all grids lying in the same ray." Could you kindly provide a more detailed explanation of what this implies in the context of your research?

Additionally, the loss function you mentioned in eq7 is not included by group. Is voxel occupancy calculated independently by group??

Thank you in advance for your time and assistance in clarifying these points. Your insights will be invaluable in furthering my understanding of this subject.

Is an 8x NVIDIA RTX 3090 GPU Setup Sufficient for Training Models in the Predictive World Model 2024 Competition?

I am excited about participating in the Predictive World Model 2024 competition and have been preparing my environment accordingly. My current setup includes a system with 8 NVIDIA RTX 3090 GPUs, which I believed would be more than capable of handling the training demands of the competition's models.

However, even after adjusting the configuration settings to the minimum requirements as per the competition guidelines, I'm encountering a persistent issue where I run out of memory. The error I receive is as follows:

RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 7; 23.70 GiB total capacity; 21.79 GiB already allocated; 18.81 MiB free; 21.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is an 8x RTX 3090 GPU setup insufficient for training the competition models, or might there be an issue with my configuration or approach?

Align paper results by myself

Thank you for your outstanding contributions. Will more training models and code be made public in the future? For example, a comprehensive comparison with UniAD.

I tried using a 1/8 pre-trained model in 1/4 fine-tune, according to the config you provided, and then pre-trained on 8*A100, but got results that were different from the return results. mAP:35.92 vs 36.90 NDS: 45.43 vs 45.77。Is this result a normal range of random fluctuations?

20240317_074726.log.json

It would be helpful to provide more pre-trained models and fine-tuned code. Because I want to continue to develop this paper, then I need a Baseline of repeatability.

Untitled

Thank you again for your excellent paper.

What about the GPU usage when finetuning?

First of all, thx for releasing this excellent work.
In general, the usage of GPU in finetuning stage is obviously less than that in pre-train stage.
However, I try to excute finetuning based on the pretrained model (trained with mem_efficient_vidar_1_8_nusc_3future_r50.py), the GPU usage reaches more than 40G.
Thus, can you inform the GPU usage when finetuning?

Thx.

Get 'wheel torch'error when run 'pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html' command

As title said, when i run pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html command, I get the following error:
ERROR: Wheel 'torch' located at /tmp/pip-unpack-n66hmakw/torch-1.10.1+cu111-cp38-cp38-linux_x86_64.whl is invalid.
I have tried pip cache purge and run the command again, but still get the same error. How can i fixed the problem ?

About the mini train.pkl and val.pkl

Could you please provide the mini divided val and train pkl files in the log provided by OpenScene? I found that the metric (CD) were much worse than the baseline provided by the official pth when I trained the baseline. What I understand is that part of the data in the official training set is divided into our validation set, resulting in good metric (CD) using the official pth file.

Error with installing scikit_image

Thank you for your great work!

Detail

I got the error below when I execute python setup.py install. In installation, I got the error when installing scikit_image. Can I get the advice about this error?

Installed /home/acf15808yd/miniconda3/envs/vidar/lib/python3.8/site-packages/mmdet3d-0.17.1-py3.8-linux-x86_64.egg
Processing dependencies for mmdet3d==0.17.1
Searching for scikit-image
Reading https://pypi.org/simple/scikit-image/
Downloading https://files.pythonhosted.org/packages/2a/e3/ec27b0d8a63fd8a2effe78bfcea3a56480ed8b0be46e5232ada3f911512a/scikit_image-0.23.0rc0.tar.gz#sha256=8d78737020e9c173af6fcdd14ac7eca88a9169d072f3c8b24e602ba3acf65cf7
Best match: scikit-image 0.23.0rc0
Processing scikit_image-0.23.0rc0.tar.gz
error: Couldn't find a setup script in /tmp/42161730.1.gpu/easy_install-6w7_obt5/scikit_image-0.23.0rc0.tar.gz

What I did

conda create -n vidar python=3.8 -y
conda activate vidar

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
conda install -c omgarcia gcc-6

pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1
python setup.py install

Environment

  • mmdet3d version
$ git branch
* (HEAD detached at v0.17.1)
  main
  • cuda
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.