kratzert / rrmpg Goto Github PK

View Code? Open in Web Editor NEW

75.0 7.0 22.0 1.4 MB

Rainfall-Runoff modelling playground

Home Page: http://rrmpg.readthedocs.io/en/latest/index.html

License: MIT License

Python 100.00%

python hydrology hydrological-modelling numba

rrmpg's Introduction

Introduction

This repository is a work in progress repository.

Official documentation can be found here: http://rrmpg.readthedocs.io

Read the :ref:`Idea section <idea>` for further information about the background and aim of this project.

Idea

One of the fundamental parts of hydrology is rainfall-runoff-modelling. The task here is to model the response of a catchment to meteorological input data and to forecast the river discharge. There are different approaches to tackle the problem, namely: conceptual models, physical-based models and data-driven models.

Although this is taught at university, often hands-on experience is missing or is done on using very simple modelling approaches. One of the main reasons I see is, that most (at least the complex ones) hydrological models are implemented in Fortran but very few students of the field of hydrology know Fortran, when they first get in touch with RR-Models. So all they can probably do is simply apply a model to their data and play manually with parameter tuning, but not explore the model and see the effect of code changes.

This might be different if there would exist well performing implementations of hydrological models in a more simplistic and readable language, such as Python. What was hindering this step was always the speed of Python and the nature of RR-Models - they mostly have to be implemented using loops over all timesteps. And well, big surprise: Pure Python and for-loops is not the best combination in terms of performance.

This could be changed e.g. by using Cython for the hydrological model, but this again might hinder the code understanding, since Cython adds non-pythonic complexity to the code, which might be hard for beginners to understand and therefore play/experiment with the code.

Another option could be PyPy. The problem I see with PyPy is, that the user would be forced to install a different Python interpreter, while most I know of are quite comfortable using e.g. Anaconda.

Numba is another way to speed up array-oriented and math-heavy Python code but without changing the language/interpreter and just by few code adaptions. Using numba, the code stays easily readable and therefore better understandable for novices. I won't spend much time now on explaining how numba works, but I'll definitely add further information in the future. First performance comparisons between Fortran implementations and numba optimized Python code have shown, that the speed is roughly the same (Fortran is about ~1-2 times faster, using the GNU Fortran compiler).

Summary: The idea of this code repository is to provide fast (roughly the speed of Fortan) implementations of hydrological models in Python to make it easier to play and experiment with rainfall-runoff models.

You want to contribute?

At the moment I'm looking for a selection of hydrological models I'll implement in Python. If you want to see any (your?) model in this project, feel free to contact me. There is also a How to contribute section at the official documentation, were you can read more on the various ways you can contribute to this repository.

Contributors

I'll add later a better looking section to the official documentation. For now I list everybody, who contributed to this repository here:

Ondřej Čertík with pull request #3: Optimized Fortran code and compilation procedure for fair speed comparision.
Daniel Klotz with pull request #4 , #5 and #9: All spell checking.
Andrew MacDonald for providing HBV-Edu simulation data from the original MATLAB implementation (see ##10)
Martijn Visser with pull request #13 to update the unittest for pandas 1.0
Martin Gauch with pull request #14 to fix a bug in the HBV model, when running multiple parameter sets at once.

Contact

Raise an issue here in this repository or contact me by mail f.kratzert(at)gmail.com

rrmpg's People

Contributors

Stargazers

Watchers

rrmpg's Issues

Calibration

Hi again, and thanks for fixing the area-related bug.
I have run the model for some catchments from the CAMELS dataset in order to have an idea of the capability of the model.
In all cases results are not good. I report Nash Sutcliffe Efficiencies to give you an idea:

Although this is ok for educational purposes, it would also be nice to be able to show a case with good fitting.
I guessed the problem might be the calibration method used, which finds an optimal parameter set which corresponds to a local minima. However, this should not be the case, since scipy.optimize.differential_evolution "finds the global minimum of a multivariate function".

Implementation/use of a more advanced calibration method (e.g., the Shuffled Complex Evolution Algorithm) or the integration of available python libraries (e.g., spotpy) might solve the problem (again "might", I did not test them beforehand). But it might also not be necessary.
I am currently trying to fix parameters bounds first, to check if it helps.
Thanks again, Stefano

Possible bug in Fortran

Great to see this project using Numba and very pleased to see Numba performing so well :)

I think in this https://github.com/kratzert/RRMPG/blob/master/examples/speed_comparision.ipynb
Cell [4], the %%fortran magics declaration has a possible bug.

I'm not sure the Python loop in Cell [3]:

    for i in range(rain.size):
        state_out = (1 - c) * state_in + a * rain[i]
        outflow[i] = (1 - a - b) * rain[i] + c * state_in
        state_in = state_out

is quite the same as the Fortran implementation (difference annotated):

        state_in = 0
        state_out = 0
        do t = 1,col_dim
            state_out = (1 - c) * state_in + a * inflow(t) !! this assignment is not used and will be dead-code eliminated
            outflow(t) = (1 - a - b) * inflow(t) + c * state_in !! due to state_in always being 0, `c * state_in` is eliminated, (1-a-b) will be hoisted as a loop invariant constant
            state_out = state_in !! again, unused, so DCE.
        end do

which I think ends up looking to the compiler like:

        do t = 1,col_dim
            outflow(t) = (constant) * inflow(t)
        end do

and if one looks at the assembler (gfortran -O3 -S):

.L3:
        movsd   (%r8), %xmm0
        cmpl    $1, %edx
        movl    $2, %edi
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, (%r9)
        jbe     .L5
        movsd   8(%r8), %xmm0
        cmpl    $2, %edx
        movb    $3, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 8(%r9)
        jbe     .L5
        movsd   16(%r8), %xmm0
        cmpl    $3, %edx
        movb    $4, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 16(%r9)
        jbe     .L5
        movsd   24(%r8), %xmm0
        movb    $5, %dil
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, 24(%r9)
.L5:
        cmpl    %edx, %eax
        je      .L1

the repeated block:

        movsd   (%r8), %xmm0
        cmpl    $1, %edx
        movl    $2, %edi
        mulsd   %xmm1, %xmm0
        addsd   %xmm2, %xmm0
        movsd   %xmm0, (%r9)
        jbe     .L5

is essentially outflow(t) = (1 - a - b ) * inflow(t), loop unwound with %r8+offset as inflow and %r9+offset as outflow.

If you also think this is a bug, would be interesting to see how Numba performs against equivalent code.

Numba warning

Hello, Numba is raising a deprecation warning on the extrapolate_temperature function in rrmpg/models/cemaneige_utils.py due to a future change in functionality "NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.". Can this be addressed?
Thanks, James

Add validation run for HBV model to test suite

I need someone with MATLAB installed to run the original code for me with fixed parameters to generate a simulation timeseries to compare against. Inputs can be used from the provided data, shipping with the MATLAB code.

get_random_params() should be adapted to return n parameter sets on request

It might be handy for further changes to adapt the random parameter generation functions to have an additional argument, that let's the user specify how many parameter sets should be generated at once.

Therefore changes in the set_params() function might be necessary since the get_random_params will return a dictionary of numpy arrays instead of a dictionary of numbers.

Parallelize simulation function of models to process multiple parameter sets in parallel

The parallelization feature of the numba library yields a great potential for further speed improvements, whenever multiple parameter for a model needs to be evaluated. This can be done in parallel, since each model run with one parameter set is independent of another parameter set. First quick test show huge speed improvements that come to play e.g. for the Monte Carlo implementation or any further optimization scheme.

Here I have published an article explaining some of the basics of numbas parallelization features. The important two things are the parallel=True flag for the decorator, as well as the prange function, used to iterate of the parameter sets.

This might need adaption in the _loss() function of each model, since they are expecting a 1-D array as output of the simulation function, but will now return a 2-D array.

HBVEdu model: missing attribute area?

Hi,
I am trying to set up an HBVEdu model by using RRMPG.
I follow step by step the procedure illustrated here (for another model). However, when I get to the step of fitting, I get the following error:

result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'])

AttributeError Traceback (most recent call last)
in
1 result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
----> 2 cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'])

C:\Anaconda3\envs\rrmpg\lib\site-packages\rrmpg-0.1.1-py3.6.egg\rrmpg\models\hbvedu.py in fit(self, qobs, temp, prec, month, PE_m, T_m, snow_init, soil_init, s1_init, s2_init)
284 # pack input arguments for scipy optimizer
285 args = (qobs, temp, prec, month, PE_m, T_m, snow_init, soil_init,
--> 286 s1_init, s2_init, self._dtype, self.area)
287 bnds = tuple([self._default_bounds[p] for p in self._param_list])
288

AttributeError: 'HBVEdu' object has no attribute 'area'

So, attribute area is missing. My problem is that I cannot understand how I should pass "area" to the model. I tried several things, such as:

area = 100 or
model = HBVEdu(area=100)
-> no effect

model.set_params(area, 100) or
model.set_params({'area':100})
-> not working, area is not a parameter (it is not in the list of parameters)

result = model.fit(cal['QObs(mm/d)'], cal['tmean(C)'], cal['prcp(mm/day)'],
cal.index.month.values, long_mean['PET'], long_mean['tmean(C)'],
snow_init=0.0, soil_init=0.0, s1_init=0.0, s2_init=0.0, area=100)
-> also not working

From the error, I see that the code is searching for self.area, so I believe this should be provided as an attribute of model. But I cannot understand how.
I am not proficient in python, and that is maybe the issue.
I thank you in advance if you can help.

Where can i get the mean potential evapotranspiration for the other gauges in camels

The camels dataset didn't contains the mean potential evapotranspiration (PET) in 'XXX_lump_cida_forcing_leap.txt', but the CemaneigeGR4J needs the input PET. In Model API Example, I found that the example 'PET' data are from the '01031500_05_model_output.txt', so I wonder where can i get the PET data for other gauges.

Adapt all metric functions to calculate metrics for multiple simulation at once

With the last commit 27e420d all simulation functions were parallelized. We now have the output of the simulation function qsim being optionally a 2D-array, were the second dimension (first for Pythonista) holds the results for different parameter sets. It would now be good, if we could pass this array directly to all the the metric functions in rrmpg.utils.metrics, which at the moment only works with both inputs being same sized arrays. The problem is mainly the validate_array_input() function, which flattens the array, which is favorable in many cases, but not in the case of 2D simulation arrays with multiple simulations.

I'm still unsure how to resolve this. Options are:

handle the input validation checks for the simulation arrays in all metric functions separate.
extend/adapt the validate_array_input() function to have n option for not flattening the array
write a second array validation function for 2D arrays without flattening. Maybe the least favorable, since it would basically double main parts of the code.

At the moment I tend to the 2nd option, with an optional input argument which defaults to not flattening.

Citation and/or acknowledgement

I've used your implementation of GR4J as a template for a Jax-ified version that I've been using for some applications of MCMC for change detection. Here's the repo, for reference.

Do you have a paper or citable reference I can point to? For now I plan to mention the RRMPG repository in my manuscript.

kratzert / rrmpg Goto Github PK

rrmpg's Introduction

Introduction

Idea

You want to contribute?

Contributors

Contact

rrmpg's People

Contributors

Stargazers

Watchers

Forkers

rrmpg's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs