GithubHelp home page GithubHelp logo

aaronspring / remote_climate_data Goto Github PK

View Code? Open in Web Editor NEW
19.0 2.0 2.0 2.41 MB

a collection of remote climate data accessed via intake cached to disk

License: MIT License

Jupyter Notebook 99.80% Python 0.20%
data-catalog shapefiles accessibility netcdf thredds-catalogs opendap observations climate-science climate-data remote

remote_climate_data's Introduction

testing Binder pre-commit.ci status

remote_climate_data

a collection of remote climate data accessed via intake cached to disk

Usage

import intake
cat = intake.open_catalog('https://raw.githubusercontent.com/aaronspring/remote_climate_data/master/master.yaml')
cat.atmosphere.HadCRUT5.to_dask()

To explore the whole catalog, you can try:

cat.walk()

Goal

Make data access for climate data easy:

  • cacheable data
  • documentation attached in metadata
  • shareable catalogs
  • quick vizualisations

Contribute and extend

  • PRs for new remote climate datasets or useful geoshapes are very welcome

Relies on

Similar projects

remote_climate_data's People

Contributors

aaronspring avatar larsbuntemeyer avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

remote_climate_data's Issues

test all parameter combinations

currently only defaults are tested

cat.atmosphere.GISTEMP(t_res='ltm').describe()
'user_parameters': [{'name': 's_res',
   'description': 'spatial resolution',
   'type': 'str',
   'allowed': ['1200km', '250km'],
   'default': '1200km'},
  {'name': 'realm',
   'description': 'data region including or excluding ocean',
   'type': 'str',
   'allowed': ['landonly', 'combined'],
   'default': 'combined'},```

setup CI

  • test all nc files incl. parameters with xr.open_dataset(url, chunks={}) whether available while not downloading which would take too long probably

Specify why data access works

the access model is not very clear.

  • nc files via opendap works only without caching

  • nc files via other sources: from http(s), ftp only works with/because of caching

  • http can also work with #bytes

to do: explain better in readme

obs4MIP catalog

It is annoying to download obs4mip datasets. here, we could create a catalog with common observational data products. Example:

# atmos xco2 crdp: CO2 monthly satellite from 2003 to 2014
# httpServer after search on https://esgf-data.dkrz.de/search/obs4mips-dkrz/
url='https://esgf-data1.ceda.ac.uk/thredds/fileServer/esg_esacci/ghg/data/obs4mips/crdp_3/CO2/v100/xco2_ghgcci_l3_v100_200301_201412.nc'
import intake_xarray
intake_xarray.NetCDFSource('simplecache::'+url,storage_options=dict(simplecache={'same_names':True, 'cache_storage':'my_cache'})).to_dask()

# GPCP 1.3 GB
url='https://dpesgf03.nccs.nasa.gov/thredds/fileServer/obs4MIPs/observations/NASA-GSFC/Obs-GPCP/GPCP/1DD_v1.2/atmos/pr_GPCP-1DD_L3_v1.2_19961001-20110630.nc'
ds = intake_xarray.NetCDFSource('simplecache::'+url,storage_options=dict(simplecache={'same_names':True, 'cache_storage':'my_cache'})).to_dask()

How to:

  1. Search for data product: https://esgf-data.dkrz.de/search/obs4mips-dkrz/
  2. select HTTPServer
  3. get link
  4. create catalog entry

Advantages compared to downloading yourself:

  • collection of links
  • obs4mip is cmorized in units and variable names
  • automatic caching enabled

Open questions:

  • How to organize the catalog? realm:name:freq as nested catalogs?

Wishlist for new datasets

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.