GithubHelp home page GithubHelp logo

santurini / aerial-view-segmentation Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 24.09 MB

Binary and Multi-class Image Segmentation of High Resolution Aerial Drone Images

License: MIT License

Jupyter Notebook 99.98% Python 0.02%
deeplabv3 kaggle-dataset manet pytorch pytorch-lightning semantic-segmentation unet

aerial-view-segmentation's Introduction

Drone Images Semantic Segmentation

pythonPyTorchPyLightningKaggleColabVSCode

Friendly Reminder

Your support will be truly appreciated and feel free to contact me at my following links or just send me an email:

Repository Structure

The repository is structured as follows:

  • code folder: contains the notebook for image preprocessing, Binary segmentation and Multi-class segmentation
  • plots folder: contains two subfolders binary and multiclass with the respective plots

Dataset

The dataset used is called Semantic Segmentation Drone Dataset and can be downloaded already processed at the following link.

From the original dataset the images were processed in such a way as to reduce the resolution and rename the labels to perform both Binary and Multi-class Classification; in the second case instead of using the original 24 classes they were grouped into 5 macro-classes as follows:

binary_classes = {
	0: {0, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, # obstacles
	1: {1, 2, 3, 4, 9} # landing zones
}

grouped_classes = {
         0: {0, 6, 10, 11, 12, 13, 14, 21, 22, 23}, # obstacles
         1: {5, 7}, # water
         2: {2, 3, 8, 19, 20}, # soft-surfaces
         3: {15, 16, 17, 18}, # moving objects
         4: {1, 4, 9} # landable
}
Image Binary 5-classes
594 594 594

Models and Training

The performance of 3 different Image segmentation models, each with its own particular characteristic, considered the state of the art were compared just to go to show how the different underlying concepts differed.

The models all had as their backbone an efficient-b0 pretrained on imagenet, while the decoders were trained for 25 epochs on the augmented train set. Given the limited number of images (just 400) augmentation was crucial in order to train better the models.

The criterion used for the backpropagation was the Dice Loss (Binary and Multi-class) and the model was evaluated with Recall, False Positive Rate and image-wise IoU (in the Multi-class case all the metrics beside IoU were computed per-class).

Model Charachteristic Paper
U-Net Fully Convolutional paper
DeepLabV3 Dilated Convolutions paper
MAnet Attention Mechanism paper

Binary Segmentation

We leave here some mask predictions and results from the binary segmentation task.

Images Groundtruth U-Net DeepLabV3 MAnet
Models Recall FPR IoU
U-Net 0.971 0.222 0.923
DeepLabV3 0.971 0.251 0.919
MAnet 0.973 0.249 0.918

Multi-class Segmentation

This are the results for the 5-class segmentation:

Images Groundtruth U-Net DeepLabV3 MAnet
U-Net Obstacles Water Nature Moving Landing
Recall 0.67 0.96 0.882 0.657 0.955
FPR 0.022 0.001 0.029 0.002 0.123
IoU 0.518 0.903 0.843 0.581 0.842
DeepLabV3 Obstacles Water Nature Moving Landing
Recall 0.633 0.955 0.905 0.672 0.94
FPR 0.022 0.001 0.062 0.004 0.107
IoU 0.503 0.883 0.814 0.563 0.896
MAnet Obstacles Water Nature Moving Landing
Recall 0.492 0.921 0.891 0.682 0.95
FPR 0.012 0.001 0.048 0.004 0.162
IoU 0.431 0.83 0.82 0.566 0.825



aerial-view-segmentation's People

Contributors

santurini avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.