This library contains classes and functions to generate datasets corresponding to spatial features from a time-series of satellite images. The impetus for this project was to develop an easy to use, high-level interface to numerous Python modules for the clustering and classification of land cover/land use (LULC) types, with an initial focus on classifying individual crop types in challenging geographies using a time-series of multi-spectral earth observatoin (EO) images. The use of a time-series of EO images better captures the dynamic nature of the appearance of crops and other LULC classes through a growing season, enabling more accurate model predictions. The functions and methods provided in this library can be used to generate EO reflectance time-series datasets and models for arbitraty vector data, e.g. points or polygons.
The library is divided in to several components:
-
tsmask
: provides functions to create a masked numpy arrays corresponding to areas of interest, as well as aBandTimeSeries
object initialized using the maked array. Specific functions and objects include:-
raserize
utilizes theosgeo
library and the underlyinggdal
functionaility to rasterize vector features from a shapefile and output a .tif file sharing the relevant metadata and dimensions as the reference image from which it was created. Acheck_rasterize
function is also provided to confirm that the features were correclty "buned" into the raster layer. The resulting image can be characterized as a land cover "mask". -
mask_to_array
generates a 3D numpy array from the output ofrasterize
. Each element of the 3D array is a 2D array representing band reflectance values for a given date. Values in the 3D array that are not no-data values correspond to a land cover class burned in usingrasterize
. -
BandTimeSeries
objects contain information about time-series' of reflectance values for samples in a given land cover class, and methods to operate on and format the reflectance time-series.BandTimeSeries
objects are initialized using an output from themask_to_array
function, along with arguments specifying the land cover class of the object, and the variable (band) name of the reflectance time-series. Thetime_series_data_frame
method allows for interpolation of the time-series.
-
-
tsclust
: provides aTimeSeriesSample
class that is useful for generating a dataset from all or a subset of data contained in aBandTimeSeries
and formating it for direct use in the functions and classes provided in thetslearn
library.-
TimeSeriesSample
take n_samples of the data in aBandTimeSeries
and optionally smooth the time-series' using a Savgol signal smoothing. Thets_dataset
method generates an object that can be used directly in the time series clustering and classification algorithms provided in thetslearn
library. -
cluster_time_series
performs eitherGlobalAlignmentKernelKMeans
orTimeSeriesKMeans
(both from thetslearn
library) on aTimeSeriesSample
object. The user specifies the number of clusters as well as the distance metric used if the clustering algorithm isTimeSeriesKMeans
(dynamic time warping or soft dynamic time warping). Sillhouette scores computed on the resulting clusters can optionally be returned. Alternative sets of hyperparamters forcluster_times_series
can be tested using thecluster_grid_search
function. -
cluster_mean_quantiles
andplot_clusters
provide methods for inspecting and visualizing cluster results.
-
-
tstrain
provides functions for extracting training datasets comprising time-series' of band reflectance values at known locations (x,y numpy array indices) from satelite scenes.-
random_ts_samples
takes n_samples from .csv files containging reflectance time-series data for a given land cover class. -
get_training_data
reads satellite scenes, e.g. scense corresponding to an areo of interest specified withsat-search
and download and saved using the default direcorty structure ofsat-search load
, into numpy arrays using functionaility fromgippy
. The output is a long-formpandas
dataframe with colums for date, feature (band-value), band reflectance value, the 2d array index, and a label corresponding to a samples land cover class. -
format_training_data
takes the ouput ofget_training_data
and reshapes it into a 3D numpy array of shape (n_samples, n_timesteps, n_features) suitable for use in aKeras
Sequential model. Both x and y (optionally one-hot encoded) are returned.
-
Coming soon: Two jupyter notebook tutorials showcasing the functionality in this library