ktfish / neptuns-eye Goto Github PK

Project realised with the Visimind company at University of Warmia and Mazury in Olsztyn.

License: MIT License

Python 99.94% Makefile 0.06%

czarna-magia drone las lidar-data point-cloud visimind

neptuns-eye's Introduction

Neptun's Eye

Neptun's Eye is a simple ML-powered point cloud segmentation tool. The project has been developed by students from Czarna Magia Student Artificial Intelligence Society and students from University of Warmia and Mazury under the mentorship of Visimind.

Latest build

Neptun's Eye v0.1.2 Download here

Requirements

Windows 10 or newer OS
[Optional] Python 3.7.9 for pptk support

Note

The app requires OpenGL 3.3 or newer and might not work correctly on virtual machines or old computers.

Installation

We packed our app into easy to run executable. You can download and run it right away or dowload some additional tools for more functionalities.

Ready-to-run build

Download latest build above.
Extract the files into a single folder located anywhere convenient for you.
Locate and execute main.exe file in the dist\main folder.

That's it! You're good to go!

Important

Verify if there are models you are using inside dist\main\_internal\resources\models folder.

Note

While our project incorporates visualization tools, it's important to note that they may not support every type of Lidar data. Among them, the pptk package stands out as the most versatile option. However, utilizing it necessitates an additional Python installation, as detailed below.

More options

Click to see more installation options

Visualisation with pptk

Our app leverages a swift and efficient point cloud visualization tool in Python pptk. To utilize this tool, you'll need to install Python 3.7.9 on your computer and download pandas and pptk for this specific version using any package management system.

You can download Python 3.7.9 from the official website here

After you install python you must change the Python 3.7.9 path in the app settings. See how to do that below.

Click to see how to set up Python 3.7.9 for our app

Locate your python installation. The default location is %LOCALAPPDATA%/Programs/Python/Python37
Make sure you installed pandas and pptk for Python 3.7.9:

%LOCALAPPDATA%/Programs/Python/Python37/python.exe -m pip install pandas
%LOCALAPPDATA%/Programs/Python/Python37/python.exe -m pip install pptk

Navigate to the %LOCALAPPDATA%/Programs/Python/Python37 folder in your system.
Copy the path from navigation bar e.g. C:\Users\<YourName>\AppData\Local\Programs\Python\Python37
Open the Neptun's Eye app, click settings and paste the path into python37 variable.
Change userprofile_path to False

Run with Poetry

Click to see how to run our app with Poetry

Note: This is for advanced users. We do not recommend this method.

Install pipx.
Install poetry using pipx (do not use brew).
Install pyenv. Check if it is installed correctly by running pyenv --vesion.
Create virtual environment using pyenv with python 3.11.
Install poetry. Check if it is installed correctly by running poetry --vesion.
Install dependencies using poetry.

Installation Details

Create virtual environment:

poetry env use $(pyenv which python)

You should see something like this:

Using virtualenv: C:\Users\Admin\AppData\Local\pypoetry\Cache\virtualenvs\neptuns-eye-z6EeDWoH-py3.11

This command is used for installing dependencies from requirements.txt using poetry. You will probably not use it and directly install dependencies from pyproject.toml file. This is left here only for reference.

poetry add $(cat requirements.txt)

Run

make run

Test

make test

Reference materials

Install `make` on Windows

Install chocolatey
Install make using choco.

choco install make

Usage

Click to see the user guide

Visualisation and classification

Launch the Neptun's Eye application.
Click the Select File button to load your point cloud.
If the point cloud loads successfully, select a Rendering tool. We recommend using either Polyscope or pptk.
Visualization in Neptun's Eye is designed just for preview purposes. Set the Rendering stride to ensure smooth rendering. We recommend generating between 500,000 and 2,000,000 points.
Click Render visualization and wait for the result. This process may take up to one minute, depending on the size of your point cloud.
To perform classification, choose a model from the Classification options section. If you're using our models, we recommend ExtraTrees or RandomForest.
To classify the entire point cloud, press the Run classification button and wait for the confirmation message. This process duration depends on the point cloud size.
To preview model performance, check the Use stride box in the Classification section. This option will classify points based on the Rendering stride selected in the visualization section, saving time and resources.

Research & ML

During the project a lot of effort has been invested in the data preprocessing. Each dataset that we worked with have been described by a Dataset Card. It was crucial for the project because it was the first time we have been working with Point Clouds and .las file format.

At the beginning we researched the PointNet and PointNet++ architectures because they are neural networks dedicated for Point Clouds. During the research we decided to begin with more baseline models. Finally we ended up with using tee models like Random Forest or Extra Trees Classifier. The Point Net architecture is planned to be implemented in the near future.

For experiment tracking we used Weights and Biases, which helped us tremendously with finding the best hyperparameters for our models. Latter we used also Optuna.

Data

Click to check out how we processed data

Classified data:

WMII.las datacard

USER AREA.las datacard

Unclassified data:

kortowo.las datacard

Data dependencies

Corelation matrix of wmii.las with empty columns removed

Searching for the most significant columns

The impact of given columns on the accuracy of the RandomForestClassifier model

stride for validation dataset = 30, stride for training dataset = 30, n_estimators = 100

Feature	Set 1	Set 2	Set 3	Set 4	Set 5
X	✓	✓
Y	✓	✓
Z	✓	✓	✓	✓	✓
red	✓	✓	✓	✓	✓
green	✓	✓	✓	✓	✓
blue	✓	✓	✓	✓	✓
intensity		✓	✓	✓
return_number		✓		✓	✓
edge_of_flight_line		✓	✓	✓
scan_angle_rank		✓		✓	✓
number_of_returns			✓	✓	✓

the influence of R, G and B columns on the accuracy of the RandomForestClassifier model

feature_columns = ['Z', 'red', 'green', 'blue', 'intensity','number_of_returns', 'return_number','edge_of_flight_line', 'scan_angle_rank'], training dataset stride = 720, validation dataset stride = 30, n_estimators = 100

Searching for dataset minimization

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training dataset

Note: Stride means that every stride record will be used, it's basically like a step. Stride = 2 means every other record will be selected.

Stride	Validation Accuracy
No stride	0.7037
stride = 2	0.7039
stride = 5	0.7037
stride = 10	0.7038
stride = 30	0.7035
stride = 60	0.7024
stride = 120	0.7015

Note: Stride higher than 120 will rarely be used.

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training and validation dataset

The effect of data scaling on the accuracy of the RandomForestClassifier model

stride on training dataset = 720, stride on validation dataset = 30, n_estimators = 100

	Test Accuracy	Validation Accuracy
Raw Data	0.931131809	0.709942897
MinMaxScaler	0.930849562	0.709571228
Difference	0.000282247	0.000371669

Impact of normalization of R, G and B columns (divide by 65025) on the accuracy of the RandomForestClassifier model

	Test Accuracy	Validation Accuracy
Raw RGB	0.931131809	0.709942898
Normalized RGB	0.859441152	0.577975895
Difference	0.071690657	0.131966998

Comparison of classifiers

Classifier	Test Accuracy	Validation Accuracy	Validation Accuracy from Optuna
AdaBoostClassifier	0.8944	0.6352	0.7681
BaggingClassifier	0.9252	0.6893	0.7183
ExtraTreesClassifier	0.9303	0.7446	0.7655
GradientBoostingClassifier	0.9325	0.7183	0.7402
HistGradientBoostingClassifier	0.9390	0.7094	0.7995
KNeighborsClassifier	0.8913	0.7044	0.6992
RandomForestClassifier	0.9311	0.7099	0.7205
StackingClassifier	0.9385	0.7021	0.7011
VotingClassifier	0.9359	0.7205	0,7392

Correlation matrix of ExtraTreesClassifier

Models description

ExtraTreesClassifier

The ExtraTreesClassifier is an ensemble learning method provided by the scikit-learn library for classification tasks. It stands for Extremely Randomized Trees and operates by constructing a multitude of decision trees during training. Unlike traditional decision trees, ExtraTreesClassifier introduces additional randomness by selecting split points and features at random for each tree. This results in a diverse set of trees, which enhances predictive performance and robustness. The classifier is efficient, can handle large datasets, and provides feature importance scores, helping to identify the most relevant features for the classification task.

RandomForestClassifier

The RandomForestClassifier is an ensemble learning method in the scikit-learn library designed for classification tasks. It operates by constructing multiple decision trees during training and combines their outputs to determine the final class prediction. This approach improves predictive performance and controls over-fitting by averaging the results of individual trees, each trained on random subsets of the data and features. The classifier is robust, handling missing values and noisy data effectively, and can scale well with large datasets. Additionally, it provides estimates of feature importance, helping to identify which features are most influential in making predictions.

HistGradientBoostingClassifier

The HistGradientBoostingClassifier is a machine learning model provided by the scikit-learn library in Python. It is a type of gradient boosting algorithm that uses histograms to speed up the training process. This classifier is designed for supervised learning tasks, specifically classification problems. It works by building an ensemble of decision trees in a stage-wise manner and optimizing for a loss function. The histogram-based approach allows it to handle large datasets efficiently, making it faster and more scalable compared to traditional gradient boosting methods.

Used stack

ML: Sklearn, Pandas, Laspy
Experiment Tracking: Weights and Biases, Optuna
GUI: customtkinter
Point cloud visualisation: pptk, polyscope, plotly
Version Control: Git & GitHub
Project Organization: GitHub Projects

License

This project is licensed under the MIT License.

Neptun's Eye Team

GUI & App:

Nikodem Przybyszewski

ML team:

Assistant

Alan Ferenc

neptuns-eye's People

Contributors

Stargazers

Watchers

neptuns-eye's Issues

Fix logo

Logo is not visible in the readme file. Probably it is ignored or not commited to repository.

Implement more models from sklearn

RandomForestClassifier

somewhat optimal parameters were found
Model structure written
Trained on a small part of the data
Prediction done
Score model function
"Completly" trained
Peak accuracy : 2.5%

KNeighborsClassifier

somewhat optimal parameters were found
Model structure written
Trained on a small part of the data
Prediction done
Score model function
"Completly" trained
Accuracy : ?

RadiusNeighborsClassifier

point cloud have to be denoised
Model structure written
Trained on a small part of the data
Prediction done
Score model function
"Completly" trained
Accuracy : ?

Additional features:

Function that takes Model name
Denoise function

Hints

See the scripts for Random Forest, you can reuse the code for preprocess and visualise.
Check out Hands-On Machine Learning with Sci-kit learn book.

Acceptance criteria

At least two models are implemented and ready to integrate with wandb.
Pull request is merged.

Tests for Pandas

Create tests that will check for Kortowo and WMII las files:

if the number of points is correct
if the columns are correct
if the types of columns are correct
...

Use this code as reference

import pytest
import pandas as pd

@pytest.fixture
def data():
    df = pd.read_csv('/usr/local/share/games.csv')
    # Return df with the special keyword
    yield df
    # Remove all rows in df
    df.drop(df.index, inplace=True)
    # Delete the df variable
    del df

def test_type(data):
    assert type(data) == pd.DataFrame

def test_shape(data):
    assert data.shape[0] == 6_000_000

Optuna implementation

- Add Optuna environment for testing each model

GUI: Data loading and management

Basic functionality allowing users to load .las files.
Display information about the loaded .las file (number of loaded points, number of classes etc.)
Overwrite point cloud or save as new .las file after classification

DOCS: Write a complete project documentation

Implement random forest

Implement random forest using sklearn.

README update

confusion and correlation matrix
dodać datacardy (do folderu datacards)
opisać jak dokładnie przetworzyliśmy pliki las
opisać modele które działały najlepiej (tam maks dwa zdania per model, podejdźmy do tego tak jak w pracach naukowcy, czyli jak nei wiesz to sobie doczytaj)
~~Dodać screeny aplikajcji (Nikodem)~~

GUI: UI Project and base code

Create the base code, UI and file structure for the app using customtkinter:

Create a simple UI using customtkinter
Create classes for frames
Load point cloud to the app.
BUG: The path in the path TextBar is not erased while selecting another file, instead it's appended
~~Add a help button that will explain certain functions of the app~~

App-Model first integration (ExtraTreesClassifier)

Azure implementation

load .las file from Azure Blob Storage
make prediction on them
Add .joblib weights to cloud
Integrate both weights and data loading from cloads during tests runned by github actions.

GUI: Classification algorithm configuration

Ability to parameterize, including selecting classes for individual objects to be classified, providing users with algorithm parameters (if applicable).

Detailed tasks for this issue are not yet known or defined. They will be defined when the task is in progress.

GUI: Fix bug - Colour of `generated_points_count_lb` will be white in light mode app

This code in visualisation_frame.py causes this issue.

    def update_generated_points_count_lb(self):
        """
        Update the label showing the count of generated points.

        Args:
            color (str): Optional color for the label.

        Returns:
            None
        """
        if self.__las_handler.file_loaded:
            self.generated_points_count = round(self.las_handler.las.header.point_count / self.rendering_stride)
            formatted_generated_points_count = f"{self.generated_points_count:,}".replace(',', ' ')
            self.generated_points_count_lb.configure(text=f"{formatted_generated_points_count} points will"
                                                          f" be generated.")
        if not self.check_rendering_method_limit(self.generated_points_count):
            self.generated_points_count_lb.configure(text_color="orange")
            self.too_many_points = True
        else:
            # TODO: FIX
            self.generated_points_count_lb.configure(text_color="white")
            self.too_many_points = False

Research and implement Doker

Read documentation and ~~official doker course~~ Datacamp Course
Checkout how join it with Python
Find out how use doker with the model.
Find out how use doker with the app ~~(consult with Nikodem).~~

Note: ~~It will be more beneficial if we do it before ensembleAI hackathon.~~

Doker App Deployment

Preparation

Ensure we now how contenerize customtkinter apps build with poetry.
Ensure the project structure is aligned to Doker's needs.
Install poetry on doker. See this article for reverence.

Doker work!

Ensure you have Poetry installed globally or in your local development environment.
Make sure your pyproject.toml and poetry.lock files are up-to-date and reflect all your dependencies.
Create a Dockerfile
Start with a base image that supports multiple Python versions or choose one and manage multiple environments internally.
Install python 3.7 (for pttk) and python 3.11.4 (for the app) in the container.
Install Poetry inside the Docker container.
Copy your application's source code into the Docker container.
Use Poetry to install the project dependencies.
Set the appropriate CMD or ENTRYPOINT to run your application.

Reference materials

https://www.docker.com/blog/how-to-dockerize-your-python-applications/
CustomTkinter GitHub Actions - could be helpful to ensure the app won't break in the future.

Acceptance criteria:

A short note is added to cv-knowledge-notes about how use doker (links to documentation and tutorials).
Create a demo container for some dummy code and test if it works on another computer.

GUI: Move the model dictionary to external `json` file for ease of importing new models

GUI: Fix bug - no or incorrect exception handling

No or incorrect Exception handling

issued by @KTFish

fix visualization colors

set all the same colors fot all visualization tools

- fix pptk colors
- fix plotly colors

GUI: Batching for big point cloud visualisations

Set a point limit for every rendering method
Add batching to allow users select with portion of the point cloud they want to render

wandb model artifacts

W&B model and dataset artifacts

functions for save and load model artifacts

Task to do

Resources

https://docs.wandb.ai/ref/python/save
https://docs.wandb.ai/guides/artifacts/download-and-use-an-artifact

GUI: Fix bug - the app sometimes has problems with dividing by 10

BUG: While changing the stride value on the slider, generated_points_count_lb label shows incorrect number of points that will be generated.

What did I do:

Loaded file: output.las
Number of points is 6 550 792
Changed stride to 10 using the slider

Experiment tracking

Check if it is feasible to use wandb.
Split into smaller tasks: artifacts, model weights, parameters, plots, point cloud points, scripts for training and other...
Create template scripts for sklearn that are easy to use for Michał and Kacper.
Test scripts on random forest model.
Add plots to weekly presentation 3.

Resources:

Point cloud visualisation in wandb

GUI - Fix bug: Missing model files error is not handled

adjusting read_las_file function to all las formats

- change function to take only chosed columns

GUI: Point cloud 3D display using open3d and pptk

Ability to display the point cloud using open3d package in the left bottom corner of the app.

~~- [x] Display the point cloud with open3d within the app~~
Won't do: Open3d works with python 3.11 or older.

Display the point cloud with pptk within the app
Merge 2D rendering and 3D rendering back into one tab

GUI: Think of a different way to pass data to the pptk script

The function below should be rewritten without using csv for optimising the rendering process. Classification also needs to be passed to the pptk stript.

    def render_pptk(self) -> None:
        """
        Renders LAS data using pptk.

        Returns:
            None
        """
        self.rendering_progress_lb.configure(text="Please wait. Rendering in progress...", text_color="red")

        local_app_data_path = os.environ.get("LOCALAPPDATA", "")
        python37_path = local_app_data_path + "\\Programs\\Python\\python37\\python.exe"
        print(python37_path)
        script_path = "script_pptk.py"
        dataframe_temp_file_path = ".tempdf.csv"

        self.save_selected_columns_to_csv(['X', 'Y', 'Z', 'red', 'green', 'blue'])

        os.environ.copy()
        subprocess.run([python37_path, script_path, dataframe_temp_file_path], check=True, text=True)
        self.rendering_progress_lb.configure(text="Done!", text_color="green")```

Confusion matrix plot

Implement a confusion matrix using seaborn.

GUI: Polish translation for the app

Translate existing strings into Polish
Make all the strings in the app reference localization .json files.
Maintain the localization file until end of the project

GUI: Batched visualisation

Discover and set a limit for generating data cloud visualisation for all visualisation methods
Display a live-updated label showing how many points will be generated
If the limit is reached automatically enable batching option
Implement batching option (only a portion of the points will be generated)
Add a possibility for the user to select the portion of points displayed

GUI: Point cloud display 2D using matplotlib

Add the ability to display the point cloud with matplotlib in the left bottom corner of the app:

Display point cloud with matplotlib
BUG: Move the plot figure to the correct place in the app
BUG: Set the app size so that the plot figure fits within the window
Add more, faster and optimised visiualisation options (for example plotly)

GUI: Toolbar in APP

Toolbar for app settings containing Settings tab and Language tab.

Create a toolbar in the app
Add an option for quick access to models.json file
Add an option for quick access to classes_definition.json file
Create and implement app confuguration file (either ini or json) that will contain user preferences like language and python paths
Add an option to change between available languages

Make an executable app build

GUI: Add tooltips for existing widgets

Add tips and tooltips to the widgets in the GUI app.

GUI - Fix bug: App is not scaled on other screen resolutions

GUI: Generating raports

Generating and saving reports about the classification to a file.
Report might cotain data describing how many points have been classify, how many classes have been assigned to how many points, some graphs etc.

Raports will be generated as an event log that will briefly describe and display all the actions done in the file as well as some data about the results of classification. The even log later can be saved into a txt file.

Create event log frame
Make event log work in the app
Add a possibility to save the event log into txt file
(Optional) Add auto save funciton to the log and crash log

Correlation matrix in seaborn

Find out which features correlate the most with the labelled class.
Plot the matrix and add it to documentation.
Write conclusions.

Translate existing strings into German

Make sure we understand the data and the dataset card is complete.
Decide what is better: torch Dataset class or torch_geometric Dataset.
Implement Dataset class for the point cloud.
Make sure the dataset is not processing withheld points.
Implement Dataloader.

Check out the following repositories for inspitation

Hints:

Check out Pytorch documentation.
A good example on how implement custom datasets is shown in PyTorch Custom Datasets.
Consider checking out the datasets implementation from Pytorch3D library.
Consider checking out Pytorch Geometric
Point Cloud Processing with torch geometric tutorial notebook.

Acceptance criteria:

Implemented and reviewed dataset and dataloader.
Short note to cv-knowledge-notes about implementing a dataset for Point Clouds.

ktfish / neptuns-eye Goto Github PK

neptuns-eye's Introduction

Neptun's Eye

Latest build

Requirements

Installation

Ready-to-run build

More options

Visualisation with pptk

Run with Poetry

Installation Details

Run

Test

Reference materials

Install make on Windows

Usage

Visualisation and classification

Research & ML

Data

Classified data:

Unclassified data:

Data dependencies

Corelation matrix of wmii.las with empty columns removed

Searching for the most significant columns

The impact of given columns on the accuracy of the RandomForestClassifier model

the influence of R, G and B columns on the accuracy of the RandomForestClassifier model

Searching for dataset minimization

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training dataset

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training and validation dataset

The effect of data scaling on the accuracy of the RandomForestClassifier model

Impact of normalization of R, G and B columns (divide by 65025) on the accuracy of the RandomForestClassifier model

Comparison of classifiers

Correlation matrix of ExtraTreesClassifier

Models description

ExtraTreesClassifier

RandomForestClassifier

HistGradientBoostingClassifier

Used stack

License

Neptun's Eye Team

neptuns-eye's People

Contributors

Stargazers

Watchers

neptuns-eye's Issues

RandomForestClassifier

KNeighborsClassifier

RadiusNeighborsClassifier

Additional features:

Doker App Deployment

W&B model and dataset artifacts

Task to do

Resources

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Install `make` on Windows