mmhs013 / pymannkendall Goto Github PK

View Code? Open in Web Editor NEW

233.0 4.0 64.0 743 KB

A python package for non parametric Mann Kendall family of trend tests.

License: MIT License

Python 96.66% TeX 3.34%

mann-kendall-tests mk-test trend-analysis python modified-mann-kendall-test seasonal-mann-kendall

pymannkendall's Introduction

pyMannKendall

What is the Mann-Kendall Test ?

The Mann-Kendall Trend Test (sometimes called the MK test) is used to analyze time series data for consistently increasing or decreasing trends (monotonic trends). It is a non-parametric test, which means it works for all distributions (i.e. data doesn't have to meet the assumption of normality), but data should have no serial correlation. If the data has a serial correlation, it could affect in significant level (p-value). It could lead to misinterpretation. To overcome this problem, researchers proposed several modified Mann-Kendall tests (Hamed and Rao Modified MK Test, Yue and Wang Modified MK Test, Modified MK test using Pre-Whitening method, etc.). Seasonal Mann-Kendall test also developed to remove the effect of seasonality.

Mann-Kendall Test is a powerful trend test, so several others modified Mann-Kendall tests like Multivariate MK Test, Regional MK Test, Correlated MK test, Partial MK Test, etc. were developed for the spacial condition. pyMannkendal is a pure Python implementation of non-parametric Mann-Kendall trend analysis, which bring together almost all types of Mann-Kendall Test. Currently, this package has 11 Mann-Kendall Tests and 2 sen's slope estimator function. Brief description of functions are below:

Original Mann-Kendall test (original_test): Original Mann-Kendall test is a nonparametric test, which does not consider serial correlation or seasonal effects.
Hamed and Rao Modified MK Test (hamed_rao_modification_test): This modified MK test proposed by Hamed and Rao (1998) to address serial autocorrelation issues. They suggested a variance correction approach to improve trend analysis. User can consider first n significant lag by insert lag number in this function. By default, it considered all significant lags.
Yue and Wang Modified MK Test (yue_wang_modification_test): This is also a variance correction method for considered serial autocorrelation proposed by Yue, S., & Wang, C. Y. (2004). User can also set their desired significant n lags for the calculation.
Modified MK test using Pre-Whitening method (pre_whitening_modification_test): This test suggested by Yue and Wang (2002) to using Pre-Whitening the time series before the application of trend test.
Modified MK test using Trend free Pre-Whitening method (trend_free_pre_whitening_modification_test): This test also proposed by Yue and Wang (2002) to remove trend component and then Pre-Whitening the time series before application of trend test.
Multivariate MK Test (multivariate_test): This is an MK test for multiple parameters proposed by Hirsch (1982). He used this method for seasonal mk test, where he considered every month as a parameter.
Seasonal MK Test (seasonal_test): For seasonal time series data, Hirsch, R.M., Slack, J.R. and Smith, R.A. (1982) proposed this test to calculate the seasonal trend.
Regional MK Test (regional_test): Based onHirsch (1982) proposed seasonal mk test, Helsel, D.R. and Frans, L.M., (2006) suggest regional mk test to calculate the overall trend in a regional scale.
Correlated Multivariate MK Test (correlated_multivariate_test): This multivariate mk test proposed by Hipel (1994) where the parameters are correlated.
Correlated Seasonal MK Test (correlated_seasonal_test): This method proposed by Hipel (1994) used, when time series significantly correlated with the preceding one or more months/seasons.
Partial MK Test (partial_test): In a real event, many factors are affecting the main studied response parameter, which can bias the trend results. To overcome this problem, Libiseller (2002) proposed this partial mk test. It required two parameters as input, where, one is response parameter and other is an independent parameter.
Theil-Sen's Slope Estimator (sens_slope): This method proposed by Theil (1950) and Sen (1968) to estimate the magnitude of the monotonic trend. Intercept is calculate using Conover, W.J. (1980) method.
Seasonal Theil-Sen's Slope Estimator (seasonal_sens_slope): This method proposed by Hipel (1994) to estimate the magnitude of the monotonic trend, when data has seasonal effects. Intercept is calculate using Conover, W.J. (1980) method.

Function details:

All Mann-Kendall test functions have almost similar input parameters. Those are:

x: a vector (list, numpy array or pandas series) data
alpha: significance level (0.05 is the default)
lag: No. of First Significant Lags (Only available in hamed_rao_modification_test and yue_wang_modification_test)
period: seasonal cycle. For monthly data it is 12, weekly data it is 52 (Only available in seasonal tests)

And all Mann-Kendall tests return a named tuple which contained:

trend: tells the trend (increasing, decreasing or no trend)
h: True (if trend is present) or False (if the trend is absence)
p: p-value of the significance test
z: normalized test statistics
Tau: Kendall Tau
s: Mann-Kendal's score
var_s: Variance S
slope: Theil-Sen estimator/slope
intercept: intercept of Kendall-Theil Robust Line, for seasonal test, full period cycle consider as unit time step

sen's slope function required data vector. seasonal sen's slope also has optional input period, which by the default value is 12. Both sen's slope function return only slope value.

Dependencies

For the installation of pyMannKendall, the following packages are required:

numpy
scipy

Installation

You can install pyMannKendall using pip. For Linux users

sudo pip install pymannkendall

or, for Windows user

pip install pymannkendall

or, you can use conda

conda install -c conda-forge pymannkendall

or you can clone the repo and install it:

git clone https://github.com/mmhs013/pymannkendall
cd pymannkendall
python setup.py install

Tests

pyMannKendall is automatically tested using pytest package on each commit here, but the tests can be manually run:

pytest -v

Usage

A quick example of pyMannKendall usage is given below. Several more examples are provided here.

import numpy as np
import pymannkendall as mk

# Data generation for analysis
data = np.random.rand(360,1)

result = mk.original_test(data)
print(result)

Output are like this:

Mann_Kendall_Test(trend='no trend', h=False, p=0.9507221701045581, z=0.06179991635055463, Tau=0.0021974620860414733, s=142.0, var_s=5205500.0, slope=1.0353584906597959e-05, intercept=0.5232692553379981)

Whereas, the output is a named tuple, so you can call by name for specific result:

print(result.slope)

or, you can directly unpack your results like this:

trend, h, p, z, Tau, s, var_s, slope, intercept = mk.original_test(data)

Citation

If you publish results for which you used pyMannKendall, please give credit by citing Hussain et al., (2019):

Hussain et al., (2019). pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.. Journal of Open Source Software, 4(39), 1556, https://doi.org/10.21105/joss.01556

@article{Hussain2019pyMannKendall,
	journal = {Journal of Open Source Software},
	doi = {10.21105/joss.01556},
	issn = {2475-9066},
	number = {39},
	publisher = {The Open Journal},
	title = {pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.},
	url = {http://dx.doi.org/10.21105/joss.01556},
	volume = {4},
	author = {Hussain, Md. and Mahmud, Ishtiak},
	pages = {1556},
	date = {2019-07-25},
	year = {2019},
	month = {7},
	day = {25},
}

Contributions

pyMannKendall is a community project and welcomes contributions. Additional information can be found in the contribution guidelines.

Code of Conduct

pyMannKendall wishes to maintain a positive community. Additional details can be found in the Code of Conduct.

References

Bari, S. H., Rahman, M. T. U., Hoque, M. A., & Hussain, M. M. (2016). Analysis of seasonal and annual rainfall trends in the northern region of Bangladesh. Atmospheric Research, 176, 148-158. doi:10.1016/j.atmosres.2016.02.008
Conover, W.J., (1980). Some methods based on ranks (Chapter 5), Practical nonparametric statistics (2nd Ed.), John Wiley and Sons.
Cox, D. R., & Stuart, A. (1955). Some quick sign tests for trend in location and dispersion. Biometrika, 42(1/2), 80-95. doi:10.2307/2333424
Hamed, K. H., & Rao, A. R. (1998). A modified Mann-Kendall trend test for autocorrelated data. Journal of hydrology, 204(1-4), 182-196. doi:10.1016/S0022-1694(97)00125-X
Helsel, D. R., & Frans, L. M. (2006). Regional Kendall test for trend. Environmental science & technology, 40(13), 4066-4073. doi:10.1021/es051650b
Hipel, K. W., & McLeod, A. I. (1994). Time series modelling of water resources and environmental systems (Vol. 45). Elsevier.
Hirsch, R. M., Slack, J. R., & Smith, R. A. (1982). Techniques of trend analysis for monthly water quality data. Water resources research, 18(1), 107-121. doi:10.1029/WR018i001p00107
Jacquelin Dietz, E., (1987). A comparison of robust estimators in simple linear regression: A comparison of robust estimators. Communications in Statistics-Simulation and Computation, 16(4), pp.1209-1227. doi: 10.1080/03610918708812645
Kendall, M. (1975). Rank correlation measures. Charles Griffin, London, 202, 15.
Libiseller, C., & Grimvall, A. (2002). Performance of partial Mann-Kendall tests for trend detection in the presence of covariates. Environmetrics: The official journal of the International Environmetrics Society, 13(1), 71-84. doi:10.1002/env.507
Mann, H. B. (1945). Nonparametric tests against trend. Econometrica: Journal of the Econometric Society, 245-259. doi:10.2307/1907187
Sen, P. K. (1968). Estimates of the regression coefficient based on Kendall's tau. Journal of the American statistical association, 63(324), 1379-1389. doi:10.1080/01621459.1968.10480934
Theil, H. (1950). A rank-invariant method of linear and polynominal regression analysis (parts 1-3). In Ned. Akad. Wetensch. Proc. Ser. A (Vol. 53, pp. 1397-1412).
Yue, S., & Wang, C. (2004). The Mann-Kendall test modified by effective sample size to detect trend in serially correlated hydrological series. Water resources management, 18(3), 201-218. doi:10.1023/B:WARM.0000043140.61082.60
Yue, S., & Wang, C. Y. (2002). Applicability of prewhitening to eliminate the influence of serial correlation on the Mann-Kendall test. Water resources research, 38(6), 4-1. doi:10.1029/2001WR000861
Yue, S., Pilon, P., Phinney, B., & Cavadias, G. (2002). The influence of autocorrelation on the ability to detect trend in hydrological series. Hydrological processes, 16(9), 1807-1829. doi:10.1002/hyp.1095

pymannkendall's People

Contributors

Stargazers

Watchers

pymannkendall's Issues

[Joss review] Example usage

Please provide a real world example that shows the usage of your tool. Great would be a plot. Also relates to #2

References should have DOI's

It looks like none of the references have their DOI's listed - should be an easy fix. Let me know if you can't find any and I can track them down.

Add new functions

Add new function in next version

Over-whitening mann-kendall test based on Şen, Z., 2017. Hydrological trend analysis with innovative and over-whitening procedures. Hydrological Sciences Journal, 62(2), pp.294-305. (#12 & #13).
Modified Over-whitening mann-kendall test based on ZHANG Hongbo, 2018. Modified over-whitening process and its application in Mann-Kendall trend tests. Journal of Hydroelectric Engineering, 2018, Vol. 37, No. 6: 34-46 (#12 & #13).
Block Bootstrapped Mann-Kendall Trend Test based on Khaliq, M.N., Ouarda, T.B.M.J., Gachon, P., Sushama, L. and St-Hilaire, A., 2009. Identification of hydrological trends in the presence of serial and cross correlations: A review of selected methods and their application to annual flow regimes of Canadian rivers. Journal of Hydrology, 368(1-4), pp.117-130.

np.nanmedian issue in newer versions of numpy...

Hi,

I have tried to use this module and there was a problem in newer versions of numpy >1.9.0 there is no np.nanmedian() function hence I could not get it working until I changed the np.nanmedian() calls to np.median(). However, before doing that, just after the _preprcessing(x) function I have removed the NaN values from x to make the np.median() work:
x, c = __preprocessing(x)
x = x[np.isfinite(x)]
After a few tests it seems to be fine...

Best
M.

[JOSS review] A statement of need

Please state for what the software can be used more clearly in the project description. Maybe even include a plot of an example time-series.

A issue of MMK's correction of var(s)

In the size of 'accout for autocorrelation', i have no ideal of sni's calculation, the sni just involve the signicant cor, is that true?

Is there any workaround for the 'ZeroDivisionError: float division by zero' when using original test ?

Hi there,
I was trying to find trend of precipitartion datasets of gpcc and encountered the error 'ZeroDivisionError: float division by zero'.
Is there a way to avoid these errors ?
thanks!

Deprecated use of np.float in __preprocessing function.

Hi! I recently ran into an issue trying to use the original_test function on some time series data. Here is the error message:

in __preprocessing(x)
18 def __preprocessing(x):
---> 19 x = np.asarray(x).astype(np.float)
20 dim = x.ndim
22 if dim == 1:
...

AttributeError: module 'numpy' has no attribute 'float'.
np.float was a deprecated alias for the builtin float. To avoid this error in existing code, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I'm not sure if other people are having the same issue? I could change my numpy version, but this seems like a non-ideal fix to the issue. My current numpy version is 1.24.3. Thank you!

Can't use pyMannKendall when the sequence has high autocorrelation?

Can't use pyMannKendall when the sequence has high autocorrelation?
Based on (modified over-whitening,MOW) Mann-Kendall (M-K) seems to solve this problem, but I will not write code!

How to indicate the region in the Regional MK test?

I've read different explanations of the Regional Mann Kendall test, and for what I understand it's necessary to declare the location of each time series at least so to lump the S statistic together, just like with Seasonal there's an indication of the frequency of the season. I've seen that in the package the Regional MK test only has the data input. How does it work? I've read the READme, the examples and I can't figure it out. Thanks!

Based on (modified over-whitening,MOW) Mann-Kendall (M-K) trend test method for trend test.

over-whitening Mann-Kendall and modified over-whitening Mann-Kendall?????????????????????????????????????

How to do it?

numpy > 1.23

anyone going to convert this to work with numpy 1.24? i can if no one else wants to.

[JOSS review] Install instructions

python setup.py install will not works without root access of most systems. This could confuse inexperienced users. Maybe add

sudo or #
or add prefix or local install option

can author explain the meaning of the returned results？

p-value：what is the null hypothesis?

Elaboration on output tuple parameters

Can the tuple parameters (viz. z, tau, s, var_s, slope, intercept) be elaborated in more detail?

It's not quite clear what exactly these are representing w.r.t. the algorithm outputs such that it could help one to justify mathematically the results.

Thanks.

pyMannKendall not compatible with numpy 1.24.0

The latest release of pyMannKendall is not compatible to numpy 1.24.0 due to deprecation of np.float (Numpy v1.24.0 Changelog.
There is already a fix in the master branch commit, so a new release of pyMannKendall would fix this issue.

good job,but used scipy.stats.norm.ppf is so slow，recommended to use the URL included method instead。

https://people.sc.fsu.edu/~jburkardt/py_src/asa241/asa241.py

[JOSS review] Automated tests

Please describe how a user can run the tests. Also add a more detailed description what the tests do.
Having only a quick look at your tests it seems like you just comparing values based on some example series. Please consider writing more robust unit tests. If you need more guidance on this feel free to ask.

[JOSS review] References

Please consider adding a DOI in your paper bib to all reference where one is available.

Hi

Hi
I hope you are doing well,
How did you calculate tau for the seasonal MK?

Variance in correlated seasonal MK test

Hi,

I've been using your library to perform several MK tests and may have found a missing term in the function correlated_multivariate_test(). According to Hipel & McLeod (1994) for a correlated seasonal MK test the variance is computed as follows (Eq. 23.3.12):

Where the 2nd summation term of the equation is zero when no correlation is present in the time series. In the function correlated_multivariate_test() this 2nd summation is represented by the variable Gamma and computed as follows (Eq. 23.3.21):

But after that, the variance (var_s) for the correlated seasonal MK test is obtained with np.sum(Gamma) and the 1st summation term (which is included in the seasonal MK test without correlation) is not accounted for. Thus, shouldn't the variance in correlated_multivariate_test() include both terms (assuming no ties nor missing values):

Thank you in advance!

Aleks

Version needs to be updated/match in all locations

It looks like the version here doesn't match with what is listed for review or in the setup.py file. Should be a simple update, though you may want to talk a look at a versioneer which does all the hard work for you!

[JOSS review] Functionality documentation

Please compile a documentation using Sphinx or a similar tool. Alternatively, extend the project description with a more detailed description of the method signatures. You already have nice docstrings. A more detailed overview without looking into the code would be nice.

Add new features

Add new features in the next version :

Add intercept in sen's slope results as well as other test results.
Add more missing value analysis methods.
Implement numba package to reduce runtime.

np.pad issue ?

Hello there! Thanks you very much for bringing Mann-kendall to the python ecosystem!

While trying to debug one of my dataset, I saw a strange behaviour for seasonnal data with number of data not being a multiple of the period.

import numpy as np
import pymannkendall as mk

# Generate toy dataset with 1 increase by period of 12 months
data = np.array([
                         5,  4, 3, 2, 1, 0,
    2, 3, 4, 5,  6,  7,  6,  5, 4, 3, 2, 1,
    3, 4, 5, 6,  7,  8,  7,  6, 5, 4, 3, 2,
    4, 5, 6, 7,  8,  9,  8,  7, 6, 5, 4, 3,
    5, 6, 7, 8,  9, 10,  9,  8, 7, 6, 5, 4,
    6, 7, 8, 9, 10, 11, 10,  9, 8, 7, 6, 5,
    7,
    ])

mk.seasonal_test(data, period=12)

It output the correct data:

Seasonal_Mann_Kendall_Test(trend='increasing', h=True, p=2.6336710590157963e-12, z=6.996007264651662, Tau=0.7222222222222222, s=130.0, var_s=340.0, slope=1.0, intercept=2.5416666666666665)

Put since I put some print for debugging in your code l204 as follow:

def seasonal_sens_slope(x_old, period=12):
    # ... blabla cut
    
    if x.ndim == 1:
        if np.mod(n,period) != 0:
            print(x)
            x = np.pad(x,(0,period - np.mod(n,period)), 'constant', constant_values=(np.nan,))
            print(x)

    # ... blabla cut

it is outputting the data before and after the padding, and instead of np.nan padding value, I have -9223372036854775808, which is strange:

[ 5  4  3  2  1  0  2  3  4  5  6  7  6  5  4  3  2  1  3  4  5  6  7  8
  7  6  5  4  3  2  4  5  6  7  8  9  8  7  6  5  4  3  5  6  7  8  9 10
  9  8  7  6  5  4  6  7  8  9 10 11 10  9  8  7  6  5  7]  # <- first print, input data
[                   5                    4                    3
                    2                    1                    0
                    2                    3                    4
                    5                    6                    7
                    6                    5                    4
                    3                    2                    1
                    3                    4                    5
                    6                    7                    8
                    7                    6                    5
                    4                    3                    2
                    4                    5                    6
                    7                    8                    9
                    8                    7                    6
                    5                    4                    3
                    5                    6                    7
                    8                    9                   10
                    9                    8                    7
                    6                    5                    4
                    6                    7                    8
                    9                   10                   11
                   10                    9                    8
                    7                    6                    5
                    7 -9223372036854775808 -9223372036854775808
 -9223372036854775808 -9223372036854775808 -9223372036854775808] # <- second print, after padding.

I'm not sure if it come down to my python version, numpy (1.19.5) or pymankendall (1.4.1), but be aware that this behavior exist!

It does not lead to computational error in my case for the sens slope because of the median things taking over this issue, but I don't know if it affect other statistic output.

there may be a problem with sen's slope?

Hi! Thank you for your M-K package!
you define the sen's estimator by this: d[idx : idx + len(j)] = (x[j] - x[i]) / (j - I). But when I checked Sen's paper (1968), I think the denominator should be "Time", that is: (x[j] - x[i])/(t[j] - t[i]).

If someone uses annual data to calculate sen's slope, then your definition is right

NaN output in 'Hamed and Rao Modified MK Test'

I'm running a modified MK test on a couple of short time series:

import pymannkendall as mk

series1 = [0.41625778, 0.40488883, 0.43044564, 0.44369687, 0.46613348, 0.44420775, 0.45091315, 0.48484614, 0.40252088, 0.43944978, 0.51613973, 0.49302274]
series2 = [0.35257984, 0.38692909, 0.39669828, 0.36296244,0.42035612,0.39374964, 0.41100085, 0.43182076, 0.40815853, 0.45394297, 0.41584767, 0.47399517]

print(mk.hamed_rao_modification_test(series1))
print(mk.hamed_rao_modification_test(series2))

And I'm running into a weird result with the second time series, where var_s has a negative value and z and p values become NaN as a result:

Modified_Mann_Kendall_Test_Hamed_Rao_Approach(trend='increasing', h=True, p=0.03352416523500468, z=2.125748992103599, Tau=0.48484848484848486, s=32.0, var_s=212.66666666666666, slope=0.0070362813636363625, intercept=0.40525276250000003)
/home/joao/.local/lib/python3.8/site-packages/pymannkendall/pymannkendall.py:99: RuntimeWarning: invalid value encountered in sqrt
  z = (s - 1)/np.sqrt(var_s)
Modified_Mann_Kendall_Test_Hamed_Rao_Approach(trend='no trend', h=False, p=nan, z=nan, Tau=0.6666666666666666, s=44.0, var_s=-8.737179487179525, slope=0.008305347500000004, intercept=0.36390027875)