This Python/TensorFlow code demonstrates unsupervised snowflake classification from images obtained with the Multi-Angle Snowflake Camera using a GAN and K-medoids classification. It supports a paper "Unsupervised classification of snowflake images using a generative adversarial network and K-medoids classification" to be submitted to Atmospheric Measurement Techniques and provides all code needed to replicate the results.
You need a Python 3 environment and the following libraries:
- TensorFlow (not tested with 2.0+)
- NumPy
- SciPy
- Matplotlib
- Seaborn
- Python NetCDF4
- h5py
- imageio
- Dask
A GPU is highly recommended for training, but the experiments with pre-trained models can be run on a CPU as well. 16+ GB of RAM should be enough.
Download the training datasets here (they are too big to include in the repository).
Save the .nc
and .npy
files in the data
directory.
If you want to use the pre-trained models, you can download them here.
Save the contents of the zip file in the models
directory.
The high-level code that runs the training and evaluation needed to replicate the results can be found in replication.py
. This file has a command line interface (see below), but you could also call the functions within from an iPython terminal or a Jupyter notebook.
If you want to modify the training code, you should start by following the code flow in replication.training
.
You can evaluate the model and generate the plots shown in the paper using the downloadable datasets and the pre-trained GAN on the command line in the snow-gan-classification
directory like this:
python replication.py experiments --model_name=../models/masc_infogan_combined
where model_name
is the name of the model you want to load (use the default for the pre-trained model). For the pre-trained model, this should replicate the results exactly. If you trained the GAN yourself, you probably will get slightly different results. The plots will be saved in the figures
directory.
In practice, you may want to run the experiments one by one by copypasting the code from replication.experiments
to a terminal.
You can run the training like this:
python replication.py train --model_save_name=../models/masc_infogan
Change the --model_save_name
parameter to the name of the model you want to save. You can load a pre-existing model at the start of training using the --model_name
parameter. So, for example, to load the pre-trained model and train it further:
python replication.py train --model_name=../models/masc_infogan_combined --model_save_name=../models/masc_infogan
Run the following to compute the latent variables for all snowflakes in the dataset:
python replication.py latents --model_name=../models/masc_infogan_combined --latents_file=../data/masc_latents.nc --latent_dist_file=../data/masc_latent_dist.nc
where the --latents_file
and --latent_dist_file
parameters control where the latents are saved.