38-Cloud: A Cloud Segmentation Dataset

*New: An extension to 38-Cloud dataset is released at here.

This dataset contains 38 Landsat 8 scene images and their manually extracted pixel-level ground truths for cloud detection. 38-Cloud dataset is introduced in [1], yet it is a modification of the dataset in [2].
The entire images of these scenes are cropped into multiple 384*384 patches to be proper for deep learning-based semantic segmentation algorithms. There are 8400 patches for training and 9201 patches for testing. Each patch has 4 corresponding spectral channels which are Red (band 4), Green (band 3), Blue (band 2), and Near Infrared (band 5). Unlike other computer vision images, these channels are not combined together. Instead, they are in their correspondig directories.

The directory tree of this dataset is as follows:

├──38-Cloud_training
│------------├──train_red
│------------├──train_green
│------------├──train_blue
│------------├──train_nir
│------------├──train_gt
│------------├──Natural_False_Color
│------------├──Entire_scene_gts
│------------├──training_patches_38-Cloud.csv
│------------├──training_sceneids_38-Cloud.csv
├──38-Cloud_test
│------------├──test_red
│------------├──test_green
│------------├──test_blue
│------------├──test_nir
│------------├──Natural_False_Color
│------------├──Entire_scene_gts
│------------├──test_patches_38-Cloud.csv
│------------├──test_sceneids_38-Cloud.csv

Click here for downloading the "entire dataset" through Kaggle.

Click here for downloading the "entire dataset" from another server.

Landsat 8 Spectral Ranges:

Band #	Name	Spectral Range (nm)
2	Blue	450-515
3	Green	520-600
4	Red	630-680
5	NIR	845-885

Example Images:

Below is an example of a 384*384 training patch:

Red Green Blue NIR False color Ground truth

Some Important Points:

Thin clouds (haze) are also considered as clouds (as well as thick clouds).
Natural color images are false color images used for further visualization purposes. They have not been used in the training and test phase of [1] and [2].
Some of the patches do not have useful information (0 pixel values) in them. That is because of the black margins around the Landsat 8 images. For your ease, a list in a csv file (training_patches_38-cloud_nonempty.csv) including the names of the informative patches (patches with more than 80% informative/nonzero pixels) can be found in the dataset folder or here.

4. The code for Cloud-Net model for training and test on 38-Cloud dataset can be found here.

Evaluation over 38-Cloud Dataset:

We have prepared a simple Matlab code to help researchers evaluate their results obtained by this dataset. You can find it in the "evaluation" directory. Please note that for the sake of consistency we have not provided users with ground truths of each 384*384 test patch, but with the ground truth of the entire Landsat 8 scenes. In order to generate a complete cloud mask from small patch masks and compare it with ground truths, please follow these instructions:

1- Preparing a directory for the predicted patch masks same as below:

├──preds_folder_root

│------------├──preds_folder

2- In "preds_folder" there should be all of the obtained patch masks from the test patches. Therefore, "pred_folder" consists of 9201 patches of 384*384. These should be pixel-level probabilities (for example the direct output of the sigmoid (or softmax) activation function in the last layer of a CNN model). The provided code will binarize the probabilities to generate binary masks.

3- The outputs of the mfile are an Excel file and a txt file. The reported numbers in Table 1 of [1] is from the txt file.

4- Please note that the evaluation metrics will be calculated for each "complete scene" and then averaged over 20 of the scenes in 38-Cloud test set.

5- Name of the patches play an important role to find the exact correct location of patches in a complete scene mask. Please avoid renaming test and predicted patches.

If you found this dataset useful for your research please cite these two papers:

@INPROCEEDINGS{38-cloud-1,
  author={S. {Mohajerani} and P. {Saeedi}},
  booktitle={IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium},
  title={Cloud-Net: An End-To-End Cloud Detection Algorithm for Landsat 8 Imagery},
  year={2019},
  volume={},
  number={},
  pages={1029-1032},
  doi={10.1109/IGARSS.2019.8898776},
  ISSN={2153-6996},
  month={July},
}

@INPROCEEDINGS{38-cloud-2,   
  author={S. Mohajerani and T. A. Krammer and P. Saeedi},   
  booktitle={2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)},   
  title={{"A Cloud Detection Algorithm for Remote Sensing Images Using Fully Convolutional Neural Networks"}},   
  year={2018},    
  pages={1-5},   
  doi={10.1109/MMSP.2018.8547095},   
  ISSN={2473-3628},   
  month={Aug},  
}

[1] S. Mohajerani, T. A. Krammer and P. Saeedi, "A Cloud Detection Algorithm for Remote Sensing Images Using Fully Convolutional Neural Networks," 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Vancouver, BC, 2018, pp. 1-5. doi: 10.1109/MMSP.2018.8547095 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8547095&isnumber=8547039

[2] S. Mohajerani and P. Saeedi, "Cloud-Net: An End-To-End Cloud Detection Algorithm for Landsat 8 Imagery," IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019, pp. 1029-1032. doi: 10.1109/IGARSS.2019.8898776. Arxive URL: https://arxiv.org/pdf/1901.10077.pdf, IEEE URL: URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8898776&isnumber=8897702

grseb9s / 38-cloud-a-cloud-segmentation-dataset Goto Github PK