Using Python on JASMIN: Webinar Example Scripts

A selection of scripts to serve as an example of some of the things that you can do on JASMIN.

These scripts have been tested with specific paths and will work on the supplied example paths. As each dataset is different, you will need to modify the code to use some of these scripts in other places.

These should serve as a base to give some examples which you can use.

Creating the environment

These scripts were created with a newer version of Xarray. This means in order to run them, you will need to create a Python3 virtual environment. For convenience, there is a create-env.sh script included with the repo which should make this process easy.

./create-env.sh

NOTE: you only need to run the above script once.

Setting the environment

Each time you login to a new session and you want to run any of the scripts you will need to set the environment with:

module load jaspy
source venv/bin/activate

Using Pandas to process CSV files

Pandas is a really powerful library for creating and manipulating data tables. With Pandas you can easily read in CSV files, do some processing on them and visualise them.

This example uses rainfall data from the UK Met Office Midas Open dataset.

The headers are ignored and the data is read into a Pandas DataFrame.

Example path: /badc/ukmo-midas-open/data/uk-hourly-rain-obs/dataset-version-201908/oxfordshire/00605_brize-norton/qc-version-1

Usage:

python csv_pandas.py /badc/ukmo-midas-open/data/uk-hourly-rain-obs/dataset-version-201908/oxfordshire/00605_brize-norton/qc-version-1

usage: csv_pandas.py [-h] [-o OUTPUT] directory


Generate a plot of yearly, max, mean and min from a series of csv files in the midas open
precipitation timeseries


positional arguments:
  directory             Directory containing csv files


optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Directory to output the graph, defaults to the run directory. Default: [.]

Using Xarray to extract timeseries from netCDF

Xarray uses Dask on the backend to parallelise operations and speed up the workflow. You can use Xarray to work with NetCDF files and extract specific regions and do some processing.

This example uses Xarray to read a timeseries of NetCDF files, extract the UK region and calculate the annual mean temperature for each grid box. The result is then written to a new NetCDF file.

Example path: /badc/cmip5/data/cmip5/output1/BCC/bcc-csm1-1/amip/mon/atmos/Amon/r1i1p1/latest/tas

Usage:

python netcdf_xarray.py /badc/cmip5/data/cmip5/output1/BCC/bcc-csm1-1/amip/mon/atmos/Amon/r1i1p1/latest/tas

usage: netcdf_xarray.py [-h] [-o OUTPUT] directory


Extract a time series of annual surface temperature over the UK


positional arguments:
  directory             Directory containing source files


optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Directory to output the netcdf file, defaults to the run directory. Default
                        [.]

Using Xarray and matplotlib to plot data

Xarray can also be used with matplotlib to plot data directly. This can be used to visualise the data during analysis or as an output.

This example uses xarray to extract a region from a dataset with a specific timestep and plot the wind variable.

Example path: /badc/ecmwf-era-interim/data/wa/as/2017/04/04

Usage:

python data_visualisation.py /badc/ecmwf-era-interim/data/wa/as/2017/04/04 --bbox 70 40 20 -20

usage: data_visualisation.py [-h] [--timestep TIMESTEP]
                             [--bbox COORDINATE COORDINATE COORDINATE COORDINATE]
                             directory


Extract and area and timestamp and plot


positional arguments:
  directory             Directory containing source files


optional arguments:
  -h, --help            show this help message and exit
  --timestep TIMESTEP   Options: 0000 0600 1200 1800
  --bbox COORDINATE COORDINATE COORDINATE COORDINATE
                        Format: N,S,E,W

Using python to get a list of files which match your requirements

Python has a suite of useful filepath manipulation tools included with the standard library such as os and glob.

The filesystem on JASMIN contains useful metadata about the files at the end of the hierarchy. For example the path /neodc/esacci/sea_ice/data/sea_ice_thickness/L2P/envisat/v2.0/NH/2012/01 contains useful metadata and is of the format:

/neodc/esacci/sea_ice/data/<variable>/L2P/envisat/v2.0/<hemisphere>/<year>/<month>/*.nc

This example script will start in the directory supplied then proceed to give you a series of options as to which directory you wish to take next or even all of them. You can then either put the output into a file or print to the terminal. Before outputting your files, the script will display the glob pattern to get you files using a linux command.

Example path: /neodc/esacci/sea_ice/data/

Usage:

python file_listing.py /neodc/esacci/sea_ice/data/

usage: file_listing.py [-h] [-o OUTPUT] directory


Extract and area and timestamp and plot


positional arguments:
  directory             Start directory


optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output list of desired files

agstephens / using-python-webinar Goto Github PK

using-python-webinar's Introduction

Using Python on JASMIN: Webinar Example Scripts

Creating the environment

Setting the environment

Using Pandas to process CSV files

Usage:

Using Xarray to extract timeseries from netCDF

Usage:

Using Xarray and matplotlib to plot data

Usage:

Using python to get a list of files which match your requirements

Usage:

using-python-webinar's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs