GithubHelp home page GithubHelp logo

ktfish / neptuns-eye Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 2.74 MB

Project realised with the Visimind company at University of Warmia and Mazury in Olsztyn.

License: MIT License

Python 99.94% Makefile 0.06%
czarna-magia drone las lidar-data point-cloud visimind

neptuns-eye's Introduction

logo-400px

Neptun's Eye

Neptun's Eye is a simple ML-powered point cloud segmentation tool. The project has been developed by students from Czarna Magia Student Artificial Intelligence Society and students from University of Warmia and Mazury under the mentorship of Visimind.

Latest build

Neptun's Eye v0.1.2 Download here

Requirements

  • Windows 10 or newer OS
  • [Optional] Python 3.7.9 for pptk support

Note

The app requires OpenGL 3.3 or newer and might not work correctly on virtual machines or old computers.

Installation

We packed our app into easy to run executable. You can download and run it right away or dowload some additional tools for more functionalities.

Ready-to-run build

  • Download latest build above.
  • Extract the files into a single folder located anywhere convenient for you.
  • Locate and execute main.exe file in the dist\main folder.

That's it! You're good to go!

Important

Verify if there are models you are using inside dist\main\_internal\resources\models folder.

Note

While our project incorporates visualization tools, it's important to note that they may not support every type of Lidar data. Among them, the pptk package stands out as the most versatile option. However, utilizing it necessitates an additional Python installation, as detailed below.

More options

Click to see more installation options

Visualisation with pptk

Our app leverages a swift and efficient point cloud visualization tool in Python pptk. To utilize this tool, you'll need to install Python 3.7.9 on your computer and download pandas and pptk for this specific version using any package management system.

You can download Python 3.7.9 from the official website here

After you install python you must change the Python 3.7.9 path in the app settings. See how to do that below.

Click to see how to set up Python 3.7.9 for our app
  • Locate your python installation. The default location is %LOCALAPPDATA%/Programs/Python/Python37
  • Make sure you installed pandas and pptk for Python 3.7.9:
%LOCALAPPDATA%/Programs/Python/Python37/python.exe -m pip install pandas
%LOCALAPPDATA%/Programs/Python/Python37/python.exe -m pip install pptk
  • Navigate to the %LOCALAPPDATA%/Programs/Python/Python37 folder in your system.
  • Copy the path from navigation bar e.g. C:\Users\<YourName>\AppData\Local\Programs\Python\Python37
  • Open the Neptun's Eye app, click settings and paste the path into python37 variable.
  • Change userprofile_path to False

Run with Poetry

Click to see how to run our app with Poetry

Note: This is for advanced users. We do not recommend this method.

  • Install pipx.
  • Install poetry using pipx (do not use brew).
  • Install pyenv. Check if it is installed correctly by running pyenv --vesion.
  • Create virtual environment using pyenv with python 3.11.
  • Install poetry. Check if it is installed correctly by running poetry --vesion.
  • Install dependencies using poetry.

Installation Details

Create virtual environment:

poetry env use $(pyenv which python)

You should see something like this:

Using virtualenv: C:\Users\Admin\AppData\Local\pypoetry\Cache\virtualenvs\neptuns-eye-z6EeDWoH-py3.11

This command is used for installing dependencies from requirements.txt using poetry. You will probably not use it and directly install dependencies from pyproject.toml file. This is left here only for reference.

poetry add $(cat requirements.txt)

Run

make run

Test

make test

Reference materials

Install make on Windows

  1. Install chocolatey
  2. Install make using choco.
choco install make

Usage

Click to see the user guide

Visualisation and classification

  • Launch the Neptun's Eye application.
  • Click the Select File button to load your point cloud.
  • If the point cloud loads successfully, select a Rendering tool. We recommend using either Polyscope or pptk.
  • Visualization in Neptun's Eye is designed just for preview purposes. Set the Rendering stride to ensure smooth rendering. We recommend generating between 500,000 and 2,000,000 points.
  • Click Render visualization and wait for the result. This process may take up to one minute, depending on the size of your point cloud.
  • To perform classification, choose a model from the Classification options section. If you're using our models, we recommend ExtraTrees or RandomForest.
  • To classify the entire point cloud, press the Run classification button and wait for the confirmation message. This process duration depends on the point cloud size.
  • To preview model performance, check the Use stride box in the Classification section. This option will classify points based on the Rendering stride selected in the visualization section, saving time and resources.

Research & ML

During the project a lot of effort has been invested in the data preprocessing. Each dataset that we worked with have been described by a Dataset Card. It was crucial for the project because it was the first time we have been working with Point Clouds and .las file format.

At the beginning we researched the PointNet and PointNet++ architectures because they are neural networks dedicated for Point Clouds. During the research we decided to begin with more baseline models. Finally we ended up with using tee models like Random Forest or Extra Trees Classifier. The Point Net architecture is planned to be implemented in the near future.

For experiment tracking we used Weights and Biases, which helped us tremendously with finding the best hyperparameters for our models. Latter we used also Optuna.

Data

Click to check out how we processed data

Classified data:

WMII.las datacard

USER AREA.las datacard

Unclassified data:

kortowo.las datacard

Data dependencies

Corelation matrix of wmii.las with empty columns removed

corelation_matrix_wmii

Searching for the most significant columns

The impact of given columns on the accuracy of the RandomForestClassifier model

stride for validation dataset = 30, stride for training dataset = 30, n_estimators = 100

feature_sets
Feature Set 1 Set 2 Set 3 Set 4 Set 5
X
Y
Z
red
green
blue
intensity
return_number
edge_of_flight_line
scan_angle_rank
number_of_returns

the influence of R, G and B columns on the accuracy of the RandomForestClassifier model

feature_columns = ['Z', 'red', 'green', 'blue', 'intensity','number_of_returns', 'return_number','edge_of_flight_line', 'scan_angle_rank'], training dataset stride = 720, validation dataset stride = 30, n_estimators = 100

rgb

Searching for dataset minimization

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training dataset

Note: Stride means that every stride record will be used, it's basically like a step. Stride = 2 means every other record will be selected.

Stride Validation Accuracy
No stride 0.7037
stride = 2 0.7039
stride = 5 0.7037
stride = 10 0.7038
stride = 30 0.7035
stride = 60 0.7024
stride = 120 0.7015

Note: Stride higher than 120 will rarely be used.

The influence of the stride parameter on the accuracy of the RandomForestClassifier model on the training and validation dataset

stride

The effect of data scaling on the accuracy of the RandomForestClassifier model

stride on training dataset = 720, stride on validation dataset = 30, n_estimators = 100

Test Accuracy Validation Accuracy
Raw Data 0.931131809 0.709942897
MinMaxScaler 0.930849562 0.709571228
Difference 0.000282247 0.000371669

Impact of normalization of R, G and B columns (divide by 65025) on the accuracy of the RandomForestClassifier model

Test Accuracy Validation Accuracy
Raw RGB 0.931131809 0.709942898
Normalized RGB 0.859441152 0.577975895
Difference 0.071690657 0.131966998

Comparison of classifiers

Classifier Test Accuracy Validation Accuracy Validation Accuracy from Optuna
AdaBoostClassifier 0.8944 0.6352 0.7681
BaggingClassifier 0.9252 0.6893 0.7183
ExtraTreesClassifier 0.9303 0.7446 0.7655
GradientBoostingClassifier 0.9325 0.7183 0.7402
HistGradientBoostingClassifier 0.9390 0.7094 0.7995
KNeighborsClassifier 0.8913 0.7044 0.6992
RandomForestClassifier 0.9311 0.7099 0.7205
StackingClassifier 0.9385 0.7021 0.7011
VotingClassifier 0.9359 0.7205 0,7392

Correlation matrix of ExtraTreesClassifier

confusion_matrix_wmii

Models description

ExtraTreesClassifier

The ExtraTreesClassifier is an ensemble learning method provided by the scikit-learn library for classification tasks. It stands for Extremely Randomized Trees and operates by constructing a multitude of decision trees during training. Unlike traditional decision trees, ExtraTreesClassifier introduces additional randomness by selecting split points and features at random for each tree. This results in a diverse set of trees, which enhances predictive performance and robustness. The classifier is efficient, can handle large datasets, and provides feature importance scores, helping to identify the most relevant features for the classification task.

RandomForestClassifier

The RandomForestClassifier is an ensemble learning method in the scikit-learn library designed for classification tasks. It operates by constructing multiple decision trees during training and combines their outputs to determine the final class prediction. This approach improves predictive performance and controls over-fitting by averaging the results of individual trees, each trained on random subsets of the data and features. The classifier is robust, handling missing values and noisy data effectively, and can scale well with large datasets. Additionally, it provides estimates of feature importance, helping to identify which features are most influential in making predictions.

HistGradientBoostingClassifier

The HistGradientBoostingClassifier is a machine learning model provided by the scikit-learn library in Python. It is a type of gradient boosting algorithm that uses histograms to speed up the training process. This classifier is designed for supervised learning tasks, specifically classification problems. It works by building an ensemble of decision trees in a stage-wise manner and optimizing for a loss function. The histogram-based approach allows it to handle large datasets efficiently, making it faster and more scalable compared to traditional gradient boosting methods.

Used stack

  • ML: Sklearn, Pandas, Laspy
  • Experiment Tracking: Weights and Biases, Optuna
  • GUI: customtkinter
  • Point cloud visualisation: pptk, polyscope, plotly
  • Version Control: Git & GitHub
  • Project Organization: GitHub Projects

License

This project is licensed under the MIT License.

Neptun's Eye Team

GUI & App:

ML team:

Assistant

neptuns-eye's People

Contributors

nexter0 avatar perunio avatar ktfish avatar stimm147 avatar zeusthegoddd avatar michalszt avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

neptuns-eye's Issues

Fix logo

Logo is not visible in the readme file. Probably it is ignored or not commited to repository.

Implement more models from sklearn

RandomForestClassifier

  • somewhat optimal parameters were found

  • Model structure written

  • Trained on a small part of the data

  • Prediction done

  • Score model function

  • "Completly" trained

  • Peak accuracy : 2.5%

KNeighborsClassifier

  • somewhat optimal parameters were found

  • Model structure written

  • Trained on a small part of the data

  • Prediction done

  • Score model function

  • "Completly" trained

  • Accuracy : ?

RadiusNeighborsClassifier

  • point cloud have to be denoised

  • Model structure written

  • Trained on a small part of the data

  • Prediction done

  • Score model function

  • "Completly" trained

  • Accuracy : ?

Additional features:

  • Function that takes Model name
  • Denoise function

Hints

  • See the scripts for Random Forest, you can reuse the code for preprocess and visualise.
  • Check out Hands-On Machine Learning with Sci-kit learn book.

Acceptance criteria

  • At least two models are implemented and ready to integrate with wandb.
  • Pull request is merged.

Tests for Pandas

Create tests that will check for Kortowo and WMII las files:

  • if the number of points is correct
  • if the columns are correct
  • if the types of columns are correct
  • ...

Use this code as reference

import pytest
import pandas as pd

@pytest.fixture
def data():
    df = pd.read_csv('/usr/local/share/games.csv')
    # Return df with the special keyword
    yield df
    # Remove all rows in df
    df.drop(df.index, inplace=True)
    # Delete the df variable
    del df

def test_type(data):
    assert type(data) == pd.DataFrame

def test_shape(data):
    assert data.shape[0] == 6_000_000

GUI: Data loading and management

  • Basic functionality allowing users to load .las files.
  • Display information about the loaded .las file (number of loaded points, number of classes etc.)
  • Overwrite point cloud or save as new .las file after classification

README update

  • confusion and correlation matrix
  • dodać datacardy (do folderu datacards)
  • opisać jak dokładnie przetworzyliśmy pliki las
  • opisać modele które działały najlepiej (tam maks dwa zdania per model, podejdźmy do tego tak jak w pracach naukowcy, czyli jak nei wiesz to sobie doczytaj)
  • Dodać screeny aplikajcji (Nikodem)

GUI: UI Project and base code

Create the base code, UI and file structure for the app using customtkinter:

  • Create a simple UI using customtkinter
  • Create classes for frames
  • Load point cloud to the app.
  • BUG: The path in the path TextBar is not erased while selecting another file, instead it's appended
  • Add a help button that will explain certain functions of the app

Azure implementation

  • load .las file from Azure Blob Storage
  • make prediction on them
  • Add .joblib weights to cloud
  • Integrate both weights and data loading from cloads during tests runned by github actions.

GUI: Classification algorithm configuration

Ability to parameterize, including selecting classes for individual objects to be classified, providing users with algorithm parameters (if applicable).

Detailed tasks for this issue are not yet known or defined. They will be defined when the task is in progress.

GUI: Fix bug - Colour of `generated_points_count_lb` will be white in light mode app

This code in visualisation_frame.py causes this issue.

    def update_generated_points_count_lb(self):
        """
        Update the label showing the count of generated points.

        Args:
            color (str): Optional color for the label.

        Returns:
            None
        """
        if self.__las_handler.file_loaded:
            self.generated_points_count = round(self.las_handler.las.header.point_count / self.rendering_stride)
            formatted_generated_points_count = f"{self.generated_points_count:,}".replace(',', ' ')
            self.generated_points_count_lb.configure(text=f"{formatted_generated_points_count} points will"
                                                          f" be generated.")
        if not self.check_rendering_method_limit(self.generated_points_count):
            self.generated_points_count_lb.configure(text_color="orange")
            self.too_many_points = True
        else:
            # TODO: FIX
            self.generated_points_count_lb.configure(text_color="white")
            self.too_many_points = False

Research and implement Doker

  • Read documentation and official doker course Datacamp Course
  • Checkout how join it with Python
  • Find out how use doker with the model.
  • Find out how use doker with the app (consult with Nikodem).

Note: It will be more beneficial if we do it before ensembleAI hackathon.

Doker App Deployment

Preparation

  • Ensure we now how contenerize customtkinter apps build with poetry.
  • Ensure the project structure is aligned to Doker's needs.
  • Install poetry on doker. See this article for reverence.

Doker work!

  • Ensure you have Poetry installed globally or in your local development environment.
  • Make sure your pyproject.toml and poetry.lock files are up-to-date and reflect all your dependencies.
    Create a Dockerfile
  • Start with a base image that supports multiple Python versions or choose one and manage multiple environments internally.
  • Install python 3.7 (for pttk) and python 3.11.4 (for the app) in the container.
  • Install Poetry inside the Docker container.
  • Copy your application's source code into the Docker container.
  • Use Poetry to install the project dependencies.
  • Set the appropriate CMD or ENTRYPOINT to run your application.

Reference materials

Acceptance criteria:

  • A short note is added to cv-knowledge-notes about how use doker (links to documentation and tutorials).
  • Create a demo container for some dummy code and test if it works on another computer.

Experiment tracking

  • Check if it is feasible to use wandb.
  • Split into smaller tasks: artifacts, model weights, parameters, plots, point cloud points, scripts for training and other...
  • Create template scripts for sklearn that are easy to use for Michał and Kacper.
  • Test scripts on random forest model.
  • Add plots to weekly presentation 3.

Resources:

GUI: Point cloud 3D display using open3d and pptk

Ability to display the point cloud using open3d package in the left bottom corner of the app.

- [x] Display the point cloud with open3d within the app
Won't do: Open3d works with python 3.11 or older.

  • Display the point cloud with pptk within the app
  • Merge 2D rendering and 3D rendering back into one tab

GUI: Think of a different way to pass data to the pptk script

The function below should be rewritten without using csv for optimising the rendering process. Classification also needs to be passed to the pptk stript.

    def render_pptk(self) -> None:
        """
        Renders LAS data using pptk.

        Returns:
            None
        """
        self.rendering_progress_lb.configure(text="Please wait. Rendering in progress...", text_color="red")

        local_app_data_path = os.environ.get("LOCALAPPDATA", "")
        python37_path = local_app_data_path + "\\Programs\\Python\\python37\\python.exe"
        print(python37_path)
        script_path = "script_pptk.py"
        dataframe_temp_file_path = ".tempdf.csv"

        self.save_selected_columns_to_csv(['X', 'Y', 'Z', 'red', 'green', 'blue'])

        os.environ.copy()
        subprocess.run([python37_path, script_path, dataframe_temp_file_path], check=True, text=True)
        self.rendering_progress_lb.configure(text="Done!", text_color="green")```

GUI: Polish translation for the app

  • Translate existing strings into Polish
  • Make all the strings in the app reference localization .json files.
  • Maintain the localization file until end of the project

GUI: Batched visualisation

  • Discover and set a limit for generating data cloud visualisation for all visualisation methods
  • Display a live-updated label showing how many points will be generated
  • If the limit is reached automatically enable batching option
  • Implement batching option (only a portion of the points will be generated)
  • Add a possibility for the user to select the portion of points displayed

GUI: Point cloud display 2D using matplotlib

Add the ability to display the point cloud with matplotlib in the left bottom corner of the app:

  • Display point cloud with matplotlib
  • BUG: Move the plot figure to the correct place in the app
  • BUG: Set the app size so that the plot figure fits within the window
  • Add more, faster and optimised visiualisation options (for example plotly)

GUI: Toolbar in APP

Toolbar for app settings containing Settings tab and Language tab.

  • Create a toolbar in the app
  • Add an option for quick access to models.json file
  • Add an option for quick access to classes_definition.json file
  • Create and implement app confuguration file (either ini or json) that will contain user preferences like language and python paths
  • Add an option to change between available languages

GUI: Generating raports

Generating and saving reports about the classification to a file.
Report might cotain data describing how many points have been classify, how many classes have been assigned to how many points, some graphs etc.

Raports will be generated as an event log that will briefly describe and display all the actions done in the file as well as some data about the results of classification. The even log later can be saved into a txt file.

  • Create event log frame
  • Make event log work in the app
  • Add a possibility to save the event log into txt file
  • (Optional) Add auto save funciton to the log and crash log

Correlation matrix in seaborn

  • Find out which features correlate the most with the labelled class.
  • Plot the matrix and add it to documentation.
  • Write conclusions.

GUI: Classification output management

Add an ability to choose whether to overwrite an existing .las file or generate a new one.

Detailed tasks for this issue are not yet known or defined. They will be defined when the task is in progress.

GUI: German translation for the app

This will be only done if all the planned app functionalisties are finished before the end of project because I need to send localization file to my native speaker friend so that he can help with translation.

  • Translate existing strings into German

Dataset and Dataloader class

  • Make sure we understand the data and the dataset card is complete.
  • Decide what is better: torch Dataset class or torch_geometric Dataset.
  • Implement Dataset class for the point cloud.
  • Make sure the dataset is not processing withheld points.
  • Implement Dataloader.

Check out the following repositories for inspitation

Hints:

Acceptance criteria:

  • Implemented and reviewed dataset and dataloader.
  • Short note to cv-knowledge-notes about implementing a dataset for Point Clouds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.