ccinc / 3d-ml Goto Github PK

A versatile framework for 3D machine learning built on Pytorch Lightning and Hydra [looking for contributors!]

Shell 1.79% Makefile 1.63% Python 96.58%

3d 3d-deep-learning deep-learning point-cloud pytorch s3dis segmentation

3d-ml's Issues

Add Paris-Lille-3D Dataset

The Paris-Lille-3D is a Dataset and a Benchmark on Point Cloud Classification. The data has been produced by a Mobile Laser System (MLS) in two different cities in France (Paris and Lille).

The Point Cloud has been labeled entirely by hand with 50 different classes to help the research community on automatic point cloud segmentation and classification algorithms.

https://npm3d.fr/paris-lille-3d

Add Pytorch Geometric S3DIS Dataset for Semantic Segmentation Testing

https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html#torch_geometric.datasets.S3DIS

Add TorchSparse/SparseConv3d models

This adds additional complexity due to the collation of the datasets. OpenPoints datasets are batched "densely", i.e. 16 batches of data in the shape [2048, 3] are batched into a single tensor of shape [16, 2048, 3] (implemented based on the original TP3D code here

3d-ml/src/utils/batch.py

Line 17 in 70de732

def from_data_list(data_list):

). Some models/backends, such as sparse convolutions, require the data to be batched differently, i.e. into a shape of [2048*16, 4], where the 4th column is a "batch index". This can be done by using the pytorch geometric collation functions (https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Batch.from_data_list), which collate in this manner by default.

TorchPoints accomplishes this by setting a configuration option in the model to define whether it uses "dense" or "sparse" data. We would likely need to do the same, and have the dataloader batch according to this configuration option. Ref: https://github.com/torch-points3d/torch-points3d/blob/66e8bf22b2d98adca804c753ac3f0013ff4ec731/torch_points3d/datasets/base_dataset.py#L160-L174

Unit and Integration Testing

Issue for tracking how to incorporate testing within the repo.

Unit testing

Ensure datasets remain downloadable and usable
Ensure datasets are loaded in the correct torch_geometric.data.Data format
Unit testing for custom transforms
Test that models get built correctly. Some good examples from PyG repository.

Integration testing

Test data->model pipeline end-to-end for combinations of datasets and models

Help Integrate OpenPoints into a Package Format

Provide support on guochengqian/PointNeXt#79, i.e. enabling OpenPoints to become a pip and conda-installable package.

Add Semantic-Kitti Dataset

Semantic KITTI (http://semantic-kitti.org/) is a popular dataset for semantic segmentation.

Explore DDP training

Investigate the use of pytorch lightning's alternative training modalities such as DP and DDP for using multiple GPUs during training.

How does this work with OpenPoints and torchsparse? Will this work at all with their cuda extensions?
See: https://torchmetrics.readthedocs.io/en/stable/pages/lightning.html#common-pitfalls

Refactor Generic DataModule code

Some of the datamodule code can likely be refactored out, such as the dataloaders.

Add support for OpenPoints schedulers

OpenPoints contains many learning rate schedulers that we can add hydra config files to support: https://github.com/guochengqian/openpoints/tree/master/scheduler

Add DALES Aerial Lidar Dataset

We present the Dayton Annotated Laser Earth Scan (DALES) data set, a new large-scale aerial LiDAR data set with nearly a half-billion points spanning 10 square kilometers of area. DALES contains forty scenes of dense, labeled aerial data spanning multiple scene types, including urban, suburban, rural, and commercial. The data was hand-labeled by a team of expert LiDAR technicians into eight categories: ground, vegetation, cars, trucks, poles, power lines, fences, and buildings. We present the entire data set, split into testing and training, and provided in 3 different data formats. The goal of this data set is to help advance the field of deep learning within aerial LiDAR.

https://udayton.edu/engineering/research/centers/vision_lab/research/was_data_analysis_and_processing/dale.php

Add Toronto-3d Dataset

Toronto-3D is a large-scale urban outdoor point cloud dataset acquired by an MLS system in Toronto, Canada for semantic segmentation. This dataset covers approximately 1 km of road and consists of about 78.3 million points.

https://github.com/WeikaiTan/Toronto-3D

Add Semantic-3d Dataset

Semantic-3d is a terrestrially-acquired semantic segmentation dataset with 4 billion points in an urban landscape.

http://www.semantic3d.net/

Support 3d->2d Models and Workflows (e.g. SalsaNext)

Many 2d-based models have been created to perform ML tasks on 3D point clouds, such as SalsaNet/SalsaNext. Investigate supporting these workflows in this repo.

Installation issues

Hi @CCInc ,

I've tried installing this repo and occured an error while running ./install_openpoints.sh.

My versions of software components:
Using pip in virtual environment.
Python version: Python 3.10.8
pip version: pip 22.3.1
nvcc version:
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
gcc version: 7.5.
OS info:
linux version:
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
kernel version:
5.4.0-125-generic

commands ran:

#add recursive comment otherwise openpoints folder content is not included (because it's a submodule)
git clone https://github.com/CCInc/3d-ml.git --recursive

#create virtual env with python, go inside it.
cd 3d-ml
python -m virtualenv env_3d
source env_3d/bin/activate

#install pytorch with pip
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
#install pytorch geo with pip, ${CUDA} = cu116
pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
pip install torch-geometric

#install additional requirements
pip install -r requirements.txt

#install openpoints as root
sudo ./install_openpoints.sh

I receive the following warnings

cuda/emd_kernel.cu(178): error: identifier "CHECK_EQ" is undefined

cuda/emd_kernel.cu(265): error: identifier "CHECK_EQ" is undefined

cuda/emd_kernel.cu(382): error: identifier "CHECK_EQ" is undefined

I also receive some warnings:

.local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

.local/lib/python3.10/site-packages/setuptools/command/easy_install.py:160: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.

What is the recommended way to install this repo if not using conda?
Also noticed that python <= 3.8 is not supported.

Add Data Visualization Support

Provide a means of exploring datasets before, during and after training (i.e. with ground truth labels and with predicted labels).

Maybe use the Open3d visualizer API? https://github.com/isl-org/Open3D-ML/blob/21951adc23831ee0597c4f49b47deb8f122fda07/examples/visualize.py

Other alternatives are writing to Tensorboard and writing files to the output directory in a ply or las format.

Add full S3DIS dataset

TP3D and PointNeXt allow processing the full S3DIS dataset, which allows sampling by room, boxes, cylinders and spheres (while the PyG dataset is already preprocessed to 1x1m boxes).

Support other tasks (Reconstruction, Registration, Instance Segmentation)

Refactor Generic Model Code by Task

Have generic base models for tasks such as classification and segmentation, which commonly share the same metrics (such as loss, accuracy, and iou). This would likely entail refactoring out most methods besides the step and forward logic, to preprocess the data for the downstream model plugin as needed.

Rewrite Modelnet2048 using PyG InMemoryDataset

Currently, custom code is written to handle the downloading/processing of modelnet2048. It should be rewritten in the context of a pytorch geometric InMemoryDataset, which has helper functions to handle downloading and processing of the data from the h5py input format (and removing the custom implemented download functions)

See:
https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/datasets/s3dis.py <- the pyg s3dis dataset also comes from a h5py source, very similar to modelnet2048

Setup code coverage

Add Dataset Transforms Pipeline

Add support for Pytorch Geometric data transformation pipelines for train, test, and validation datasets. Reimplement some TP3D transform methods within the new repo, such as AddFeatsByKeys and GridSampling3d. Investigate using OpenPoints transforms.

Add ScanNet Dataset

http://www.scan-net.org/

ScanNet is one of the most used 3D datasets for Semantic, Instance segmentation.

The Dataloader should support both tasks. Maybe Semantic can be inherited to produce an Instance segmentation loader.

@CCInc or @leo-stan any suggestions?

Run tests in Docker image

Move CI tests to docker image
Test various CUDA versions
Test both conda and pip installation
Try to remove --user from open_points conda install

ccinc / 3d-ml Goto Github PK

3d-ml's People

Contributors

Stargazers

Watchers

Forkers

3d-ml's Issues

Unit testing

Integration testing

Recommend Projects

Recommend Topics

Recommend Org

Jobs