btschwertfeger / python-cmethods Goto Github PK

A collection of bias correction techniques written in Python - for climate sciences.

Home Page: https://python-cmethods.readthedocs.io/en/stable

License: GNU General Public License v3.0

Python 97.37% Makefile 2.63%

bias-adjustment bias-correction climate-data climate-science delta-method linear-scaling variance-scaling quantile-delta-mapping quantile-mapping delta-change-method

python-cmethods's Introduction

Hi 👋, I'm Benjamin

… a passionate developer from Germany who likes working with APIs, data visualization and trading stocks and cryptocurrencies using automated trading strategies.

Topics related to project management, SD/QA, DevOps, CI/CD, Kubernetes, and distributed systems interest me a lot, and since I'm working at CONTACT Software GmbH I have the chance to learn, use, build and apply new techniques and frameworks with my great team every day.

Private Activities

Core Projects

Project	Description
python-kraken-sdk	The python-kraken-sdk is a collection of REST and websocket clients to access the Kraken cryptocurrency exchange API using Python. [Github]
python-cmethods	A collection of bias correction techniques written in Python - for climatic research. [Github]
wavetrend-service	Containing the closed-source command-line tool "wavetrend" which uses the Kraken and Polygon.io APIs to apply a custom trading strategy. This tool runs as service in a private Kubernetes cluster and sends the detected long and short opportunities into the WavetrendSignalBroadcast Telegram channel.
genai	Introducing a versatile command-line tool that seamlessly integrates a local Large Language Model (LLM) for generating prompts, which are then fed into a stable diffusion model executed locally. The result? Stunning designs and illustrations that are automatically uploaded to Redbubble and Spreadshirt for effortless sharing and showcasing.

Other Projects

Project	Description
BiasAdjustCXX	BiasAdjustCXX is a command-line tool written in C++ that enables the fast and efficient computation of bias corrected time series climate data providing several scaling- and distribution-based techniques. [Github]
KrakenEx.jl	The KrakenEx.jl package is a collection of REST and websocket clients to access the Kraken Cryptocurrency Exchange API using Julia. [Github]
kraken-rebalance-bot	These package is a simple trading bot that can be run using the command-line. It serves as supplement for the python-kraken-sdk. [Github]

Outreach

If you are interested in climatic processes or are looking for illustrative examples and learning content, several of my websites are listed below. They cover different topics and allow the dynamic visualization of different climatic processes and models: Daisy World Model, Orbital Theory of Ice Ages, Energy Balance Models, Fourcaults Pendulum, Brownian Motion, Random Systems, Growth Model

PyPI: https://pypi.org/user/btschwertfeger
Dockerhub: https://hub.docker.com/u/btschwertfeger

python-cmethods's People

Contributors

Stargazers

Watchers

Forkers

hungrycat233 jorghaav chalachew-github alan-turing-institute nicolasfauchereau walter-git lawanfalalu hoseindjawadi andrew yjh157 gallarobert pauladigun

python-cmethods's Issues

A very small error

core.py Line 148-149

  if kwargs.get("group", None) is None:
       return apply_ufunc(method, obs, simh, simp, **kwargs).to_dataset()

must be rename a dataarray object befor it be converted to dataset. so the L149 should be pass a name (for example, 'tas') and rewrite as return apply_ufunc(method, obs, simh, simp, **kwargs).rename('tas').to_dataset().

Find a solution to process large data sets more efficient

Is your feature request related to a problem? Please describe.
Find a solution to process large data sets more efficient. Loading the data sets like in cmethods/CMethods.py#L144-L150 takes to much time, especially when dealing with very large data sets. This weak point was also addressed in 10.5281/zenodo.7678393 and stackoverflow.com/questions/75815668.

Create a changelog

Is your feature request related to a problem? Please describe.
There should be a changelog file to present an overview of the historical changes.

The behavior for data sets with different temporal resolution are not uniform

When applying the adjust function to data sets with different time spans:

When talking about length in the following, the size of the time dimension is meant.

Length of obs and simh are equal, simp is longer or shorter: That works, no problem.
Length of obs and simp are equal, simh is shorter or longer: That works too.
Length of sim and simp are equal, obs is shorter or longer: That fails in https://github.com/btschwertfeger/python-cmethods/blob/v2.2.2/cmethods/core.py#L94 since the dimensions are not correclty named if the "input_core_dims" parameter is used like in https://github.com/btschwertfeger/python-cmethods/blob/v2.2.2/doc/getting_started.rst?plain=1#L159 (note here that the example must be modified and does match the second point of this list.

To fix this, the time dimension of obs must be renamed to the time dimension name of simp during the function.

correct simulated daily precipitation using monthly observed precipitation

Thank you very much for the great tool. I have simulated the daily precipitation dataset and observed precipitation in monthly time scale. Is it possible to correct the daily data using monthly observation? If possible, can you please give an example to show how to do it using cmethods or BiasAdjustCXX ?
Many thanks
best,
zhongwang

QDM not working with longer simp length

Describe the bug

Whenever I use the Quantile delta method with the Samuel lengths of data, the method works really good. Nevertheless, when the data to be corrected is longer than the obs and simh, the tool does not correct the simp data.

To Reproduce
Input a simp with longer length than obs and simh

KeyError when Quantile* Mapping and group != None

Describe the bug
When using the adjust_3d method for adjusting distribution-based bias correction techniques like the quantile mapping or quantile delta mapping and using a group like time.month - an error occurs since the grouping creates Lists inside instead of xarray.datasets.

Move from setup.py to pyproject.toml

Is your feature request related to a problem? Please describe.
Building packages using setup.py is deprecated, so the move to pyproject.toml should be a task for the future.

Drop Python 3.8 Support

Python 3.8 reaches end of support in a few months (https://endoflife.date/python) and recently it got harder to test the cmethods package against this version, since first the hdf5 headers on macOS were missing and now on Windows too. On macOS however, it was quite easy to just install them during CI, but on Windows the complexity would just grew to an unmaintainable state - just for a few months until EOL.

To avoid that, lets just drop the support now and extend the readme/installation guide to note that the hdf5 headers must be installed.

Adjustments using `adjust` require the input data of the control period to have the same size for the time dimension

… This should not be the case since for example the QM and QDM methods do not require same sized time dimensions.

Due to this behavior, workarounds like those shown in #65 are required for data sets with different sizes.

Split Quantile Mapping into Quantile Mapping and Detrended Quantile Mapping

Is your feature request related to a problem? Please describe.
The current implementation of the quantile mapping includes the detrended quantile mapping by setting the parameter "detrended" to "True".

Describe the solution you'd like
Since quantile mapping and detrended quantile mapping are quite different and the current implementation lead to overlooking the detrended quantile mapping approach, it would be better to split them into separate methods.

The latest documentation still describes the legacy max_scaling_factor

The DQM method should also be executable via `adjust`

For more information see: #48 (comment)

The tool fails when input time series include np.nan for distribution-based methods

Describe the bug
The tool is not able to process time series with only nan values. (quantile_mapping, quantile_delta_mapping, detrended_quantile_mapping)

To Reproduce
Import CMethods and run an adjustment using datasets containing only nan values.

The error is like:

Traceback (most recent call last):
  File "biasadjust.py", line 152, in <module>
    main()
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "biasadjust.py", line 130, in main
    result = cm.adjust_3d(
  File "/Users/udo/bts/miniconda3/lib/python3.8/site-packages/cmethods/__init__.py", line 247, in adjust_3d
    result[lat, lon] = func(
  File "/Users/bts/opt/miniconda3/lib/python3.8/site-packages/cmethods/__init__.py", line 1167, in quantile_delta_mapping
    xbins = np.arange(global_min, global_max + wide, wide)
ValueError: arange: cannot compute length

Expected behavior
If one of the time series of the control period (obs, simh) is nan, there can't be made an adjustment and the functions should return the raw simp/scenario time series without further adjustment.

Add python-cmethods to the conda registry

To reach a larger audience it makes sense to add this Python package to the conda registry. For this, the appropriate precautions should be taken and a corresponding upload workflow should be set up.

https://conda-forge.org/docs/maintainer/adding_pkgs/#step-by-step-instructions

Remove the unnecessary `CMethods.py` module to access the CMethods class more easily

Is your feature request related to a problem? Please describe.
Importing the CMethods class using from cmethods.CMethods import CMethods is not very nice.

It would be better to move the content from the CMethods.py to the __init__..py to direcly import the CMethods class using from cmethods import CMethods.

Create a documentation

Is your feature request related to a problem? Please describe.
There should be a simple documentation, for example built using sphinx. This would be way better than just having a readme and large doc strings in the module.

Add pre-commit

Is your feature request related to a problem? Please describe.
Just add pre-commit to standardize and check the code a bit.

Describe the solution you'd like
Use https://pre-commit.com/ in the ci

Validate types during runtime

Use isinstance() to check that the input parameters are in the correct type.

Multiplicative Quantile Delta Mapping is not applying scaling where the delta is infinite

Describe the bug
The multiplicative quantile delta mapping procedure produces nan values in some cases if the basis of delta is negative.

The delta can be described as:

\Delta(i) = \frac{
X_{sim,p}(i)
} {
F^{-1}{sim,h} \left{ F{sim,p } \left[ X_{sim,p}(i) \right] \right} }
where
$F^{-1}{sim,h}$ represents the inverse cumulative distribution function of the modeled data of the control period
$F{sim,p}$ represents the cumulative distribution function of the modeled data that is to be adjusted
$X_{sim,p}(i)$ is the value of a climate variable $X$ at time step $i$

see: v1.0.0/cmethods/init.py#L1055-L1059

So if the basis is zero - the mentioned part return a zero as scaling factor instead of the maximum scaling factor (10 as default).

EDIT:

This also occurs on every place where the devision is applied.
And if the numerator is zero the value should remain zero - and not nan in some cases.

To Reproduce
A real world example would be: $X_{sim,p}(i)$ is the value of precipitation at time step $i$ and is 0.0001 mm. So if the value inserted into the cumulative distribution function returns a precipitation value that equals zero - for example when the precipitation in the control data was much higher than in the data stat is to be adjusted.

Expected behaviour
The change should not be zero (this means no change) - instead the maximum scaling factor should be applied.

Additional context
python-cmethods v1.0.0

Optimization for `adjust_3d`

Hello! Thank you for a fantastic offering with this package. This has been very helpful to get some quantile mapping up and running quickly. As a thank you, I wanted to offer some optimization I did on gridded bias correction that you might find useful. I specialize in vectorizing/optimizing gridded data, but do not have the capacity right now to open a full PR.

I used cm.adjust_3d on a ~111x111 lat lon grid with 40,000 time steps. The progress bar estimated around 2.5 hours for this but I didn't run it in full. With the below implementation, it ran in 1 minute.

You need a dask cluster running and a dask dataset of course to reap those benefits, but the implementation will speed up in-memory datasets too.

import numpy as np
import xarray as xr
from cmethods import CMethods as cm


def quantile_map_3d(
    obs: xr.DataArray,
    simh: xr.DataArray,
    simp: xr.DataArray,
    n_quantiles: int,
    kind: str,
):
    """Quantile mapping vectorized for 3D operations."""

    def qmap(
        obs: xr.DataArray,
        simh: xr.DataArray,
        simp: xr.DataArray,
        n_quantiles: int,
        kind: str,
    ) -> np.array:
        """Helper for apply ufunc to vectorize/parallelize the bias correction step."""
        return cm.quantile_mapping(
            obs=obs, simh=simh, simp=simp, n_quantiles=n_quantiles, kind=kind
        )

    result = xr.apply_ufunc(
        qmap,
        obs,
        simh,
        # Need to spoof a fake time axis since 'time' coord on full dataset is different
        # than 'time' coord on training dataset.
        simp.rename({"time": "t2"}),
        dask="parallelized",
        vectorize=True,
        # This will vectorize over the time dimension, so will submit each grid cell
        # independently
        input_core_dims=[["time"], ["time"], ["t2"]],
        # Need to denote that the final output dataset will be labeled with the
        # spoofed time coordinate
        output_core_dims=[["t2"]],
        kwargs={"n_quantiles": n_quantiles, "kind": kind},
    )

    # Rename to proper coordinate name.
    result = result.rename({"t2": "time"})

    # ufunc will put the core dimension to the end (time), so want to preserve original
    # order where time is commonly first.
    result = result.transpose(*obs.dims)
    return result

The nice thing about this is that it can handle 1D datasets without any issue. The limitation is they always have to be xarray objects. But it works with dask or in-memory datasets and any arbitrary dimensions as long as a labeled time dimension exists.

The other great thing is you could just implement the apply_ufunc wrapper to every single bias correction code without the need for a separate adjust_3d function. A user can pass in 1D or 2D+ data without any change in code.

Example:

obs = xr.DataArray(
    [[1, 2, 3, 4], [2, 3, 4, 5]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simh = xr.DataArray(
    [[2, 1, 5, 4], [3, 9, 1, 4]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2023-10-25", freq="D", periods=4)},
).transpose("time", "x")
simp = xr.DataArray(
    [[7, 9, 10, 14], [12, 13, 14, 15]],
    dims=["x", "time"],
    coords={"x": [0, 1], "time": pd.date_range("2040-10-25", freq="D", periods=4)},
).transpose("time", "x")

# 2D dataset
>>> quantile_map_3d(obs, simh, simp, 250, "*")
<xarray.DataArray (time: 4, x: 2)>
array([[5., 9.],
       [5., 9.],
       [5., 9.],
       [5., 9.]])
Coordinates:
  * x        (x) int64 0 1
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

# 1D dataset
>>> quantile_map_3d(obs.isel(x=0), simh.isel(x=0), simp.isel(x=0), 250, "*")
<xarray.DataArray (time: 4)>
array([5., 5., 5., 5.])
Coordinates:
    x        int64 0
  * time     (time) datetime64[ns] 2040-10-25 2040-10-26 2040-10-27 2040-10-28

adjust_3d forces group to be "time.month" if group is set to the default (None)

Describe the bug

The adjust_3d method will always call bias correction methods and forcing the group to be time.month if no group was defined or is set to the default None.

This should not be the wanted behaviour, because with this, there is no chance to apply bias corrections on 3-dimensional data sets without a grouping.

Solution
Just remove the forced grouping in the adjust_3d method.

Add a command-line interface

Providing a command-line interface could replace the examples/biasadjust.py script, so that one can for example install the python-cmethods package and call "cmethods adjust input.nc -o output.nc" (simplified) to apply the tool directly on the data without needing to build any scripts to apply the tool.

Create a release workflow for dev and production releases

Is your feature request related to a problem? Please describe.
There should be a workflow (or more) that get triggered, when a new push via PR goes into the master that triggers the upload of the package to test.pypi.org. Also a workflow that gets triggered on a new release that uploads the package to the production PyPI should be added to improve the development processes.