GithubHelp home page GithubHelp logo

adria-synthetic-data's Introduction

ADRIA-synthetic-data

Repository for the creation of synthetic input data layers for ADRIA.

Set-up

Create the environment by running,

conda env create -f ADRIA_synth_data_env.yml

This environment can then be selected in your Python editor of choice.

Add the original data package you want to create synthetic data off to the original_data folder.

Creating site data

Synthetic site data can be generated from the site-data-generation.py file in the examples folder. Add the name of your chosen original data package at the top of the file as orig_data_package = "name of file". Adjust the parameters N1, N2 and N3 as desired also. N1 is the number of unconditionalised samples to generate. N2 is the final number of spatially conditionalised sites to generate. N3 is the number of nodes to generate the final site positions in randomised radii around. Using the site data model automatically creates the synthetic data package and the package name will be given in the modal outputs as synth_site_data_fn.

Creating initial coral cover data

Synthetic initial coal cover data can be generated from the coral-cover-generation.py file in the examples folder. Add the name of your chosen original data package at the top of the file as orig_data_package = "name of file" and the name of the synthetic site data file you want to base the cover data on: e.g. root_site_data_synth = "synth_2023-7-24_152038.csv".

Creating environmental data

Synthetic wave and DHW data can be generated from the env-data-generation.py file in the examples folder. Several inputs at the beginning of the file can be changed to adjust the output of the data model. An example is shown below:

layer = "Ub"
rcp = "45"
root_original_file = "name of file"
root_site_data_synth = "synth_2023-7-24_152038.csv"
nsamples = 10
nreplicates = 5

The layer variable designates the type of data to generate, so dhw for DHW data and Ub for wave data. rcp is the RCP to use to generate data from in the original data file. nsamples is the number of samples to generate from each climate replicate. nreplicates is the number of climate replicates to use from the original dataset. In the example above, the final dataset will have 10*5 replicates based on 5 replicates from the original dataset.

Creating connectivity data

Synthetic connectivity data can be generated from the connectivity-generation.py file in the examples folder. Several inputs at the beginning of the file can be changed to adjust the output of the data model. An example is shown below:

root_original_file = "name of file"
root_site_data_synth = "synth_2023-7-24_152038.csv"
years = ["2015", "2016", "2017"]  # connectivity data years to use
num = ["1", "2", "3"]  # connectivity data sample number to use
model_type = "GAN"  # "GaussianCopula"

years designates how many years to base the synthestic connectivity dataset off, with the average being used to generate the final dataset. num designates any replicates to be used in each year (also averaged over). model_type designates whether to use the GAN model from tensorflow/keras (slower but better quality) or "Gaussian Copula" model from SDV (faster but lesser quality).

Creating a synthetic JSON for the data package

A synthetic JSON file can be added to the synthetic data package using the file data-package-json-creation.py.

Creating the whole data package

The whole data package can be created by running data-package-creation.py. The same inputs will need to be adjusted in this file as in the files described above, namely the sample number parameters and original dataset file name etc.

adria-synthetic-data's People

Contributors

rosejoycrocker avatar

Watchers

 avatar  avatar  avatar  avatar

adria-synthetic-data's Issues

New site polygons overlap

Currently, synthetic site polygons (which are created as circles around site centroids satisfying the synthetically generated areas) may overlap. This should be fixed to avoid double counting when it comes to coral cover and site areas.

Test models for creation of synthetic wave data

  • currently models have been created for synthetic site data, connectivity and dhw data.
  • wave data is also a necessary input layer for the ADRIA model
  • first try out temporal model from SDV package, then try out models from Deep Echo package if no success.

Compare current connectivity model to simple probabilistic model

Currently, the GAN model for connectivity has a reasonably long run time and is prone to mode collapse. Compare this with simple probabilistic model by fitting Gaussians over several layers of connectivity data, run for multiple iterations and take median of outcomes.

Save anonymised lats for plotting comparisons

Currently functions which save data for plotting comparisons (original, sampled and real as csvs) save the non-anonymised lats and longs. Plotting should be done with anonymised lats and longs for the synthetic data.

Add example code to ReadMe

Current ReadMe refers to example files, but snippets of these should be added to the ReadMe file for clarity.

Find a way to integrate GAN and SDV environments

Currently the environments used for the SDV models and GAN connectivity models have insoluble conflicts (NetCDF4 conflicts with geopandas). To resolve this, see if later or earlier netCDF versions do not cause conflict.

Resolve file type issue for synthetic site data

  • site data in ADRIA is now in a geopackage format
  • synthetic site data cannot be created with geometries easily (synthetic data packages cannot easily create the geometry column).
  • resolve this maybe with randomised placeholder geometry

Create function which automates creation of ADRIA data package

  • Currently dhw, connectivity and site data is created separately.
  • ADRIA requires input data in a particular datapackage format
  • create function which automatically organises input data files in this structure, given the locations of created synthetic data files as input

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.