Code produced for the academic abstract

"Investigating the capability of UAV imagery for AI-assisted mapping of Refugee Camps in East Africa"

This github repository is the code-base for the Master Thesis submitted for the Master der Naturwissenschaften in Applied Earth Observation and Geoanalysis of the Living Environment (EAGLE) at the Julius-Maximilians-Universität Würzburg. For Full Master thesis draft, see Thesis_draft0.2.pdf <-- This is still being formatted and have not been assessed, read with a grain of salt.

This work of this thesis is partnered with the Humanitarian OpenStreetMap (HOTOSM) and supported by the German Aerospace Center / Deutsches Zentrum für Luft- und Raumfahrt (DLR).

Introduction

HOTOSM would like to develop a solution for assisted mapping which can predict buildings in refugee camps on the drone imagery provided by associated organisation OpenAerialMap. Refugee camps and informal settlements reside some of the most vulnerable population, the majority of which are located in Sub-Saharan East Africa (UNHCR, 2016). Many of these settlements often lack up-to-date maps of which we take for granted in developed cities. Having up-to-date maps are important for assisting administration (e.g. population estimates, infrastructure development) in data impoverished environments and thereby encourages economic productivity (Herfort et al., 2021). The data inequality between developed and developing areas can be reduced using assisted mapping technology. To extract geospatial and imagery characteristics of dense urban enviornments, a combination of VHR satellite imagery and Machine Learning (ML) are commonly used. Recent advances in CV based Deep Learning might be able to address these issues. Convolutional Neural Networks (CNN) are a subtype of the Deep Learning (DL) family used in CV tasks. Past studies using CNN have shown high accuracy and transferability in small geographical setting (Kuffer et al., 2022). The datasets provided for this project consist of both highly structured, zoned newer refugee camps and chaotic, highly complex older camps. In addition, roofing materials are highly heterogeneous, especially in older sites where thatched roofs are often mixed with litter. This coupled with the complex spatial autocorrelation and relation due to the lack of zoning in older sites hinder rule-based and conventional ML based approach. Therefore, a CNN based approach might be able to simplify the task of selecting and testing parameters, taking advantage of VHR textural information but also learning contextual relations (Lang et al., 2022; Lehner & Blaschke, 2022). This study will be connected to a pilot project on testing the capabilities of building segmentation.

Research Questions and Answers

RQ1. Do state-of-the-art models allow for accurate detection of buildings from UAV data in refugee camps?\

RQ2. What is the optimal mixture of accurate and less-accurate labels and how does that affect the segmentation output result?\

RQ2(a). How does the introduction of complex environment such as heterogeneous urban morphologies, roofing materials, and UAV drone artefacts affect result?\

Shallow EfficientNet encoders U-Nets performed slightly better in Precision, Dice Score, and IoU on less-complicated accurately labelled Kalobeyei dataset\

BUT, they suffer larger performace loss than classical U-Nets when complex data were introduced\

RQ3. How do existing models pre-trained on classical CV datasets and/or building datasets response when applied to the setting of refugee camps?\

Further training of the EfficientNet B1 U-Net (OCC initalised) have largest improvement in Recall\

Architectures with ImageNet initialisation only saw improvement with EfficientNet B1 encoder but not B2\

Inconclusive

Experimental setup

Pre-processing pipeline

Before any process are ran, please ensure you have the capability to run shell scripts and have gdal, and PyTorch installed.

*1. Download, extract and reproject OpenAerialMap WMS raster using curl_warp.sh
*2. Rasterise available vector labels using rasterise_LBL.sh
3. 2-step normalisation (z-score --> linear scale) using labelmaker.ipynb
*4. Split the RGB into separate tif using RGB_split.sh
**5. Create virtual raster with 4 bands (R, G, B, Labels) using gdalbuildvrt
**6. From VRT make permenant raster tif using gdal_translate
7. Return to labelmaker.ipynb and crop the stacked raster using labelmaker.ipynb
8. Clean the stacked and cropped raster for no labels and non conformity using KBY_clean.ipynb
*9. Change the tiff to png, and delete the tiff using tiff2png.sh

*.sh scripts are Unix instructed shell script. Run these scripts using ./NAME_OF_SHELL_SCRIPT.sh on your Linux terminal. If you are using windows machine, you can run these scripts using Cygwin or WSL *Take extra care that many shell scrip have targetted reprojection EPSG projection automatically set to map projection EPSG:3857. You might want to change that depending on your usage. **gdal is an open-source geospatial processing library. Which contains many shell and python scripts executing processes with good memory efficiency.

Training pipeline

Dataloader dataloader.py
Training loop Train_loop.ipynb
Some classical U-Nets are available as class objects through Networks.py
Otherwise, the rest of the CNNs are constructed using higher level API segmentation-models-pytorch

Exploratory Data Analysis

See example:

Testing and Predicion

For single image testing on various networks see ALLtest_model.ipynb
For custom function to parse each camp and predict using a trained network, see PredSeg_Camp.ipynb

Baseline training results for Kalobeyei, Kakuma (perfect dataset)

Dataset: 256x256 px. 0.15 m/px.
Trainning data with augmentation: 5719
Validation data with augmentation: 1224
Testing data: 272
Optimiser: Adam
Learning rate: 1e-3
Weight decay: 1e-5
Batch size: 32, 16(OCC - 5 layer EB1-Unet)
Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

Baseline training results for Kalobeyei + Dzaleka + Dzaleka North (full dataset)

Dataset: 256x256 px. 0.15 m/px.
Trainning data with augmentation: 18242
Validation data with augmentation: 3909
Testing data: 435
Optimiser: Adam
Learning rate: 1e-3
Weight decay: 1e-5
Batch size: 32, 16(OCC - 5 layer EB1-Unet)
Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

Class-based accuracy assesments

EfficientNet B2 header performance

Dataset: 256x256 px. 0.15 m/px.
Trainning data with augmentation: 18242
Validation data with augmentation: 3909
Testing data: 435
Optimiser: Adam
Learning rate: 1e-3
Weight decay: 1e-5
Batch size: 32
Scheduler: Reduce Learning Rate on Plateau(min 1e-8) [Patient: 20 epochs, factor: 0.1]

EfficientNet B2 header U-Net ImageNet vs No ImageNet (Vanilla) weights, where: red = ImageNet and blue = No ImageNet

Depth-wise Precision and Recall change

Dataset-wise Precision and Recall change

Weight-wise Precision and Recall change

Key takeaways

Deeper network tends to reduce the classification of False Positive
Architectures trained with initialised weights from ImageNet tends to reduce the classification of False Negative
Transferability of competition winning network is limited
Models might have better precision than calculated due to Human labelling ambiguity

omranlm / hotosm_oam_codev2 Goto Github PK

hotosm_oam_codev2's Introduction