A selection of scripts to serve as an example of some of the things that you can do on JASMIN.
These scripts have been tested with specific paths and will work on the supplied example paths. As each dataset is different, you will need to modify the code to use some of these scripts in other places.
These should serve as a base to give some examples which you can use.
These scripts were created with a newer version of Xarray. This means in order
to run them, you will need to create a Python3 virtual environment. For convenience,
there is a create-env.sh
script included with the repo which should make this
process easy.
./create-env.sh
NOTE: you only need to run the above script once.
Each time you login to a new session and you want to run any of the scripts you will need to set the environment with:
module load jaspy
source venv/bin/activate
Pandas is a really powerful library for creating and manipulating data tables. With Pandas you can easily read in CSV files, do some processing on them and visualise them.
This example uses rainfall data from the UK Met Office Midas Open dataset.
The headers are ignored and the data is read into a Pandas DataFrame.
Example path: /badc/ukmo-midas-open/data/uk-hourly-rain-obs/dataset-version-201908/oxfordshire/00605_brize-norton/qc-version-1
python csv_pandas.py /badc/ukmo-midas-open/data/uk-hourly-rain-obs/dataset-version-201908/oxfordshire/00605_brize-norton/qc-version-1
usage: csv_pandas.py [-h] [-o OUTPUT] directory
Generate a plot of yearly, max, mean and min from a series of csv files in the midas open precipitation timeseries
positional arguments: directory Directory containing csv files
optional arguments: -h, --help show this help message and exit -o OUTPUT, --output OUTPUT Directory to output the graph, defaults to the run directory. Default: [.]
Xarray uses Dask on the backend to parallelise operations and speed up the workflow. You can use Xarray to work with NetCDF files and extract specific regions and do some processing.
This example uses Xarray to read a timeseries of NetCDF files, extract the UK region and calculate the annual mean temperature for each grid box. The result is then written to a new NetCDF file.
Example path: /badc/cmip5/data/cmip5/output1/BCC/bcc-csm1-1/amip/mon/atmos/Amon/r1i1p1/latest/tas
python netcdf_xarray.py /badc/cmip5/data/cmip5/output1/BCC/bcc-csm1-1/amip/mon/atmos/Amon/r1i1p1/latest/tas
usage: netcdf_xarray.py [-h] [-o OUTPUT] directory
Extract a time series of annual surface temperature over the UK
positional arguments: directory Directory containing source files
optional arguments: -h, --help show this help message and exit -o OUTPUT, --output OUTPUT Directory to output the netcdf file, defaults to the run directory. Default [.]
Xarray can also be used with matplotlib to plot data directly. This can be used to visualise the data during analysis or as an output.
This example uses xarray to extract a region from a dataset with a specific timestep and plot the wind variable.
Example path: /badc/ecmwf-era-interim/data/wa/as/2017/04/04
python data_visualisation.py /badc/ecmwf-era-interim/data/wa/as/2017/04/04 --bbox 70 40 20 -20
usage: data_visualisation.py [-h] [--timestep TIMESTEP] [--bbox COORDINATE COORDINATE COORDINATE COORDINATE] directory
Extract and area and timestamp and plot
positional arguments: directory Directory containing source files
optional arguments: -h, --help show this help message and exit --timestep TIMESTEP Options: 0000 0600 1200 1800 --bbox COORDINATE COORDINATE COORDINATE COORDINATE Format: N,S,E,W
Python has a suite of useful filepath manipulation tools included with the standard library such
as os
and glob
.
The filesystem on JASMIN contains useful metadata about the files at the end of the hierarchy.
For example the path /neodc/esacci/sea_ice/data/sea_ice_thickness/L2P/envisat/v2.0/NH/2012/01
contains
useful metadata and is of the format:
/neodc/esacci/sea_ice/data/<variable>
/L2P/envisat/v2.0/<hemisphere>
/<year>
/<month>
/*.nc
This example script will start in the directory supplied then proceed to give you a series of options as to which directory you wish to take next or even all of them. You can then either put the output into a file or print to the terminal. Before outputting your files, the script will display the glob pattern to get you files using a linux command.
Example path: /neodc/esacci/sea_ice/data/
python file_listing.py /neodc/esacci/sea_ice/data/
usage: file_listing.py [-h] [-o OUTPUT] directory
Extract and area and timestamp and plot
positional arguments: directory Start directory
optional arguments: -h, --help show this help message and exit -o OUTPUT, --output OUTPUT Output list of desired files