opendrivelab / vidar Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Home Page: https://arxiv.org/abs/2312.17655
License: Apache License 2.0
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Home Page: https://arxiv.org/abs/2312.17655
License: Apache License 2.0
Dear authors,
Thank you for your contribution!
I setup the environment according to your readme and your provided requirements.txt in previous issues, however, when I try to run the training script:
./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
it gives me the following error for MMCV package while I'm using cuda environment:
File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 274, in _inner_forward out = self.conv2(out) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 251, in forward return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias, File "/scratch/hz1922/anaconda3/envs/vidar/lib/python3.8/site-packages/mmcv/ops/modulated_deform_conv.py", line 73, in forward ext_module.modulated_deform_conv_forward( RuntimeError: modulated_deformable_im2col_impl: implementation for device cuda:0 not found.
The full error log is available here: error_log.txt. At line 55 of the error log, it shows "MMCV CUDA Compiler: not available", which may be causing the issue. Please note that I'm running the codebase on a slurm GPU HPC, which means the GPU is not installed on my login node by default, and I need to request GPU resources from the HPC. During the experiments, I ran the script after getting the GPU resources, but it still shows the above error.
By following this link, I also try to install mmcv-full cuda version using the following command
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu112/torch1.10/index.html
, but it still gives me the same error.
Is there any way to solve this issue? Thanks!
Best regards
How is the Differentiable Ray-casting compared in Table 1 implemented, and is it the same as described in [1]?
[1] Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
Hello! I've been deeply impressed by your model.
I'm using it to check if the performance metrics such as ADE, FDE, and MR are available from the model for a paper's table. But I'm in trouble at mixed with UniAD and ViDAR. Could you please share the code for accessing these metrics like motion forecasting?
Thank you very much :)
Thanks for your great work! I have doubt about the latent rendering during inference between yours vidar and 4d-occ-forecasting. They generate query pred_pcds from a binary 4D occupancy grid. Can Vidar produce 4D binary occupancy results during inference? Or Vidar can only generate future pointclouds?
When will the data processing code be released?
I looked at the configuration parameters and there didn't seem to be any parameters to control the amount of data. How do I fine-tune with all the data sets?
Thank you for your great work.
When I run ./dist_train.sh, I get the following error:
Traceback (most recent call last):
File "./tools/train.py", line 263, in <module>
main() [35/1949]
File "./tools/train.py", line 126, in main
plg_lib = importlib.import_module(_module_path)
File "/root/miniconda3/envs/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/workspace/ViDAR/projects/mmdet3d_plugin/__init__.py", line 11, in <module>
from .bevformer import *
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/__init__.py", line 2, in <module>
from .dense_heads import *
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/dense_heads/__init__.py", line 2, in <module>
from .bev_head import BEVHead
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/dense_heads/bev_head.py", line 22, in <module>
from projects.mmdet3d_plugin.bevformer.modules import PerceptionTransformerBEVEncoder
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/__init__.py", line 10, in <module>
from .vidar_decoder import (PredictionDecoder,
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/vidar_decoder.py", line 22, in <module>
from .ray_operations import LatentRendering
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/ray_operations/__init__.py", line 1, in <module>
from .latent_rendering import LatentRendering
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/modules/ray_operations/latent_rendering.py", line 12, in <module>
from ...utils import e2e_predictor_utils
File "/workspace/ViDAR/projects/mmdet3d_plugin/bevformer/utils/e2e_predictor_utils.py", line 163, in <module>
from chamferdist import ChamferDistance
File "/root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/__init__.py", line 1, in <module>
from .chamfer import ChamferDistance
File "/root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/chamfer.py", line 12, in <module>
from chamferdist import _C
ImportError: /root/miniconda3/envs/lib/python3.8/site-packages/chamferdist-1.0.0-py3.8-linux-x86_64.egg/chamferdist/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv
Apparently from https://github.com/pytorch/pytorch/blob/302ee7bfb604ebef384602c56e3853efed262030/aten/src/ATen/core/TensorBase.h#L472
I am trying to run your code in a docker container which is created from a Dockerfile as follows:
ARG CUDA_VERSION=11.3.1
ARG OS_VERSION=20.04
# pull a prebuilt image
FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}
SHELL ["/bin/bash", "-c"]
# Required to build Ubuntu 20.04 without user prompts with DLFW container
ENV DEBIAN_FRONTEND=noninteractive
# Install requried libraries
RUN apt-get update && apt-get install -y software-properties-common
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update && apt-get install -y --no-install-recommends \
libcurl4-openssl-dev \
wget \
zlib1g-dev \
git \
sudo \
ssh \
libssl-dev \
pbzip2 \
pv \
bzip2 \
unzip \
devscripts \
lintian \
fakeroot \
dh-make \
build-essential \
curl \
ca-certificates \
libx11-6 \
nano \
graphviz \
libgl1-mesa-glx \
openssh-server \
apt-transport-https
# Install other dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
libgtk2.0-0 \
libcanberra-gtk-module \
libsm6 libxext6 libxrender-dev \
libgtk2.0-dev pkg-config \
libopenmpi-dev \
&& sudo rm -rf /var/lib/apt/lists/*
# Install Miniconda
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
ENV CONDA_DEFAULT_ENV=${project}
ENV CONDA_PREFIX=/root/miniconda3/envs/$CONDA_DEFAULT_ENV
ENV PATH=/root/miniconda3/bin:$CONDA_PREFIX/bin:$PATH
# install python 3.8
RUN conda install python=3.8
RUN alias python='/root/miniconda3/envs/bin/python3.8'
# Set environment and working directory
ENV CUDA_HOME=/usr/local/cuda
ENV LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDA_HOME/extras/CUPTI/lib64/:$LD_LIBRARY_PATH
ENV PATH=$CUDA_HOME/bin:$PATH
ENV CFLAGS="-I$CUDA_HOME/include $CFLAGS"
ENV FORCE_CUDA="1"
ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/miniconda3/envs/bin:$PATH
# install pytorch
RUN pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
# install opencv
RUN python -m pip install opencv-python==4.5.5.62
# install gcc
RUN conda install -c omgarcia gcc-6 -y
# install torchpack
RUN git clone https://github.com/zhijian-liu/torchpack.git
RUN cd torchpack && python -m pip install -e .
# install other dependencies
RUN python -m pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html
RUN python -m pip install pillow==8.4.0 \
tqdm \
mmdet==2.14.0 \
mmsegmentation==0.14.1 \
numba \
mpi4py \
nuscenes-devkit \
setuptools==59.5.0
# install mmdetection3d from source
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
RUN apt-get update && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/open-mmlab/mmdetection3d.git && \
cd mmdetection3d && \
git checkout v0.17.1 && \
python -m pip install -r requirements/build.txt && \
python -m pip install --no-cache-dir -e .
# install timm
RUN python -m pip install timm
# libraries path
RUN ln -s /usr/local/cuda/lib64/libcusolver.so.11 /usr/local/cuda/lib64/libcusolver.so.10
RUN pip install einops fvcore seaborn \
iopath==0.1.9 \
timm==0.6.13 \
typing-extensions==4.5.0 \
pylint \
ipython==8.12 \
numpy==1.19.5 \
matplotlib==3.5.2 \
numba==0.48.0 \
pandas==1.4.4 \
scikit-image==0.19.3 \
setuptools==59.5.0
RUN python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
RUN mkdir /workspace && \
chmod -R a+w /workspace && \
cd /workspace
USER root
RUN ["/bin/bash"]
Inside the docker container, I setup the chamferdist package as written in the readme.
# python -c "import torch; print(torch.__version__)"
1.10.1+cu111
I'd like to ask what is the coordinate of multi-frame sensor data. I guess point clouds from past and future are transformed to the current lidar coordinate. and camera is transformed to current lidar coordinate with lidar2img. I think in your case, you do not use ego coordinate. Please point it out if I get wrong with the coordinate.
Hello,
Could you please share your environment dependencies as a .txt file? Despite following your setup instructions, in the last step, I am encountering inconsistencies in the environment regarding numpy, scikit-learn, scikit-image etc.
Thanks
When I used 1/8 of the mini data, the training data was 5108, but the github log printed is 621 (621*8=4968), and the val data length also not match (5554 vs 6462).
In addition, the evaluation indicators are also different, and the chamfer_distance_inner indicator is missing.
Thank you for your contribution! When will the code be available?
Google Cloud Disk shows that the file is in the owner's recycle bin.
Can the chamferdist package be compiled on AMD graphics cards? My pip installation shows success, but it is not working properly.
What type of GPU and how many GPU hours that may need to reproduce your work?
Thank you.
Sorry, this is a mistake of mine, I missed a space after the CUDA_VISIBLE_DEVICES='0,1,2,3'
, caused this mistake.
CONFIG=$1
GPUS=$2
PORT=${PORT:-28509}
CUDA_VISIBLE_DEVICES='0,1,2,3' \
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m debugpy --listen 5678 --wait-for-client \
-m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
$(dirname "$0")/train.py $CONFIG --launcher pytorch ${@:3} --deterministic
I am writing to express my sincere appreciation for your excellent research. Your work has been incredibly insightful and has sparked my curiosity in several areas.
I have a question regarding Equation 5 in your paper. I noticed that Equations 3 and 4 are computed in a three-dimensional space (x, y, z). Similarly, I am curious about how Equation 5 is calculated. Given that the conditional probability is three-dimensional while the Bird's Eye View (BEV) feature is two-dimensional, I am assuming there must be a method to reduce the conditional probability to two dimensions. However, I could not clearly understand the calculation method from the supplementary materials. In the section below Equation 8, there is a mention of g(i) = {xi, yi}, which appears to be in two dimensions. Could you please clarify how this computation is achieved?
Additionally, I am finding it challenging to interpret the meaning of a statement related to Equation 6: "The ray-wise features are shared by all grids lying in the same ray." Could you kindly provide a more detailed explanation of what this implies in the context of your research?
Additionally, the loss function you mentioned in eq7 is not included by group. Is voxel occupancy calculated independently by group??
Thank you in advance for your time and assistance in clarifying these points. Your insights will be invaluable in furthering my understanding of this subject.
Thanks for sharing this wonderful work. I wonder what the CD performance of the ViDAR baseline in private_test_wm, which is important for our reference and comparison.
I am excited about participating in the Predictive World Model 2024 competition and have been preparing my environment accordingly. My current setup includes a system with 8 NVIDIA RTX 3090 GPUs, which I believed would be more than capable of handling the training demands of the competition's models.
However, even after adjusting the configuration settings to the minimum requirements as per the competition guidelines, I'm encountering a persistent issue where I run out of memory. The error I receive is as follows:
RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 7; 23.70 GiB total capacity; 21.79 GiB already allocated; 18.81 MiB free; 21.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is an 8x RTX 3090 GPU setup insufficient for training the competition models, or might there be an issue with my configuration or approach?
Thank you for your outstanding contributions. Will more training models and code be made public in the future? For example, a comprehensive comparison with UniAD.
I tried using a 1/8 pre-trained model in 1/4 fine-tune, according to the config you provided, and then pre-trained on 8*A100, but got results that were different from the return results. mAP:35.92 vs 36.90 NDS: 45.43 vs 45.77。Is this result a normal range of random fluctuations?
It would be helpful to provide more pre-trained models and fine-tuned code. Because I want to continue to develop this paper, then I need a Baseline of repeatability.
Thank you again for your excellent paper.
First of all, thx for releasing this excellent work.
In general, the usage of GPU in finetuning stage is obviously less than that in pre-train stage.
However, I try to excute finetuning based on the pretrained model (trained with mem_efficient_vidar_1_8_nusc_3future_r50.py), the GPU usage reaches more than 40G.
Thus, can you inform the GPU usage when finetuning?
Thx.
As title said, when i run pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
command, I get the following error:
ERROR: Wheel 'torch' located at /tmp/pip-unpack-n66hmakw/torch-1.10.1+cu111-cp38-cp38-linux_x86_64.whl is invalid.
I have tried pip cache purge
and run the command again, but still get the same error. How can i fixed the problem ?
Could you please provide the mini divided val and train pkl files in the log provided by OpenScene? I found that the metric (CD) were much worse than the baseline provided by the official pth when I trained the baseline. What I understand is that part of the data in the official training set is divided into our validation set, resulting in good metric (CD) using the official pth file.
Thank you for your great work!
I got the error below when I execute python setup.py install
. In installation, I got the error when installing scikit_image. Can I get the advice about this error?
Installed /home/acf15808yd/miniconda3/envs/vidar/lib/python3.8/site-packages/mmdet3d-0.17.1-py3.8-linux-x86_64.egg
Processing dependencies for mmdet3d==0.17.1
Searching for scikit-image
Reading https://pypi.org/simple/scikit-image/
Downloading https://files.pythonhosted.org/packages/2a/e3/ec27b0d8a63fd8a2effe78bfcea3a56480ed8b0be46e5232ada3f911512a/scikit_image-0.23.0rc0.tar.gz#sha256=8d78737020e9c173af6fcdd14ac7eca88a9169d072f3c8b24e602ba3acf65cf7
Best match: scikit-image 0.23.0rc0
Processing scikit_image-0.23.0rc0.tar.gz
error: Couldn't find a setup script in /tmp/42161730.1.gpu/easy_install-6w7_obt5/scikit_image-0.23.0rc0.tar.gz
conda create -n vidar python=3.8 -y
conda activate vidar
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
conda install -c omgarcia gcc-6
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1
python setup.py install
$ git branch
* (HEAD detached at v0.17.1)
main
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.