For a detailed overview of the dataset visit the Cityscapes website and the Cityscapes Github repository
This repository uses the Pytorch-Lightning framework for training,evaluation and inference.
Parse files which are under the following directory sctructure
<data_path> : the root directory of the Cityscapes dataset
|
├── gtFine_trainvaltest
│ └── gtFine
│ ├── test
│ │ ├── berlin
│ │ ├── bielefeld
│ │ ├── bonn
│ │ ├── leverkusen
│ │ ├── mainz
│ │ └── munich
│ ├── train
│ │ ├── aachen
│ │ ├── bochum
│ │ ├── bremen
│ │ ├── cologne
│ │ ├── darmstadt
│ │ ├── dusseldorf
│ │ ├── erfurt
│ │ ├── hamburg
│ │ ├── hanover
│ │ ├── jena
│ │ ├── krefeld
│ │ ├── monchengladbach
│ │ ├── strasbourg
│ │ ├── stuttgart
│ │ ├── tubingen
│ │ ├── ulm
│ │ ├── weimar
│ │ └── zurich
│ └── val
│ ├── frankfurt
│ ├── lindau
│ └── munster
└── leftImg8bit_trainvaltest
└── leftImg8bit
├── test
│ ├── berlin
│ ├── bielefeld
│ ├── bonn
│ ├── leverkusen
│ ├── mainz
│ └── munich
├── train
│ ├── aachen
│ ├── bochum
│ ├── bremen
│ ├── cologne
│ ├── darmstadt
│ ├── dusseldorf
│ ├── erfurt
│ ├── hamburg
│ ├── hanover
│ ├── jena
│ ├── krefeld
│ ├── monchengladbach
│ ├── strasbourg
│ ├── stuttgart
│ ├── tubingen
│ ├── ulm
│ ├── weimar
│ └── zurich
└── val
├── frankfurt
├── lindau
└── munster
Each of the train,val,test directories contain subdirectories with the name of a city. To use a whole split, subfolder='all'
must be passed to the Dataset.create()
method in order to read the images from all the subfolders. For testing purposes a smaller number of images from the dataset can be used by passing *subfolder='<CityName>'*
. For example, passing split='train'
to the Dataset() constructor, and subfolder='aachen'
to the create()
method will make the Dataset object only read the 174 images in the folder aachen and convert them into a tf.data.Dataset. You can choose either all the subfolders or one of them, but not an arbitrary combination of them. After the images (x)
and the ground truth images (y)
are read and decoded, they are combined into a single object (x, y)
.
Generally images have a shape of (batch_size, channels, height, width)
"channels_first" format
- Split the image into smaller patches with spatial resolution
(256, 256)
. Because very image has a spatial resolution of(1024, 2048)
32 patches are produced and they comprise a single batch. This means that when the patching technique is used the batch size is fixed to 32. After this operation the images have a shape of(32, 256, 256, 3)
while the the ground truth images have a shape of(32, 256, 256, 1)
. To enable patching set theuse_patches
arguement of thecreate()
method, toTrue
.
-
Perform data
Augmentation
NOTE : while all augmentations are performed on the images, only horrizontal flip is performed on the ground truth images, because changing the pixel values of the ground truth images means changing the class they belong to.
- Normalize images :
- The input pixels values are scaled between -1 and 1 as default
- If using a pretrained backbone normalize according to what the pretrained network expects at its input. To determine what type of preprocessing will be done to the images, the name of the pretrained network must be passed as the
preprocessing
arguement of the Dataset constructor. For example, if a model from the EfficientNet model family (i.e EfficientNetB0, EfficientNetB1, etc) is used as a backbone, thenpreprocessing = "EfficientNet"
must be passed.
- Preprocess ground truth images:
- Map eval ids to train ids
- Convert to
one-hot
encoding - After this operation ground truth images have a shape of
(batch_size, 1024, 2048, num_classes)
Finally the dataset which is created is comprised of elements (image, ground_truth)
with shape ((batch_size, height, width, 3)
, (batch_size, height, width, num_classes))