GithubHelp home page GithubHelp logo

mmhs013 / pymannkendall Goto Github PK

View Code? Open in Web Editor NEW
233.0 4.0 64.0 743 KB

A python package for non parametric Mann Kendall family of trend tests.

License: MIT License

Python 96.66% TeX 3.34%
mann-kendall-tests mk-test trend-analysis python modified-mann-kendall-test seasonal-mann-kendall

pymannkendall's Introduction

pyMannKendall

Build Status PyPI PyPI - License PyPI - Status PyPI - Python Version

Downloads Conda

Google Scholar Researchgate

status DOI

What is the Mann-Kendall Test ?

The Mann-Kendall Trend Test (sometimes called the MK test) is used to analyze time series data for consistently increasing or decreasing trends (monotonic trends). It is a non-parametric test, which means it works for all distributions (i.e. data doesn't have to meet the assumption of normality), but data should have no serial correlation. If the data has a serial correlation, it could affect in significant level (p-value). It could lead to misinterpretation. To overcome this problem, researchers proposed several modified Mann-Kendall tests (Hamed and Rao Modified MK Test, Yue and Wang Modified MK Test, Modified MK test using Pre-Whitening method, etc.). Seasonal Mann-Kendall test also developed to remove the effect of seasonality.

Mann-Kendall Test is a powerful trend test, so several others modified Mann-Kendall tests like Multivariate MK Test, Regional MK Test, Correlated MK test, Partial MK Test, etc. were developed for the spacial condition. pyMannkendal is a pure Python implementation of non-parametric Mann-Kendall trend analysis, which bring together almost all types of Mann-Kendall Test. Currently, this package has 11 Mann-Kendall Tests and 2 sen's slope estimator function. Brief description of functions are below:

  1. Original Mann-Kendall test (original_test): Original Mann-Kendall test is a nonparametric test, which does not consider serial correlation or seasonal effects.

  2. Hamed and Rao Modified MK Test (hamed_rao_modification_test): This modified MK test proposed by Hamed and Rao (1998) to address serial autocorrelation issues. They suggested a variance correction approach to improve trend analysis. User can consider first n significant lag by insert lag number in this function. By default, it considered all significant lags.

  3. Yue and Wang Modified MK Test (yue_wang_modification_test): This is also a variance correction method for considered serial autocorrelation proposed by Yue, S., & Wang, C. Y. (2004). User can also set their desired significant n lags for the calculation.

  4. Modified MK test using Pre-Whitening method (pre_whitening_modification_test): This test suggested by Yue and Wang (2002) to using Pre-Whitening the time series before the application of trend test.

  5. Modified MK test using Trend free Pre-Whitening method (trend_free_pre_whitening_modification_test): This test also proposed by Yue and Wang (2002) to remove trend component and then Pre-Whitening the time series before application of trend test.

  6. Multivariate MK Test (multivariate_test): This is an MK test for multiple parameters proposed by Hirsch (1982). He used this method for seasonal mk test, where he considered every month as a parameter.

  7. Seasonal MK Test (seasonal_test): For seasonal time series data, Hirsch, R.M., Slack, J.R. and Smith, R.A. (1982) proposed this test to calculate the seasonal trend.

  8. Regional MK Test (regional_test): Based onHirsch (1982) proposed seasonal mk test, Helsel, D.R. and Frans, L.M., (2006) suggest regional mk test to calculate the overall trend in a regional scale.

  9. Correlated Multivariate MK Test (correlated_multivariate_test): This multivariate mk test proposed by Hipel (1994) where the parameters are correlated.

  10. Correlated Seasonal MK Test (correlated_seasonal_test): This method proposed by Hipel (1994) used, when time series significantly correlated with the preceding one or more months/seasons.

  11. Partial MK Test (partial_test): In a real event, many factors are affecting the main studied response parameter, which can bias the trend results. To overcome this problem, Libiseller (2002) proposed this partial mk test. It required two parameters as input, where, one is response parameter and other is an independent parameter.

  12. Theil-Sen's Slope Estimator (sens_slope): This method proposed by Theil (1950) and Sen (1968) to estimate the magnitude of the monotonic trend. Intercept is calculate using Conover, W.J. (1980) method.

  13. Seasonal Theil-Sen's Slope Estimator (seasonal_sens_slope): This method proposed by Hipel (1994) to estimate the magnitude of the monotonic trend, when data has seasonal effects. Intercept is calculate using Conover, W.J. (1980) method.

Function details:

All Mann-Kendall test functions have almost similar input parameters. Those are:

  • x: a vector (list, numpy array or pandas series) data
  • alpha: significance level (0.05 is the default)
  • lag: No. of First Significant Lags (Only available in hamed_rao_modification_test and yue_wang_modification_test)
  • period: seasonal cycle. For monthly data it is 12, weekly data it is 52 (Only available in seasonal tests)

And all Mann-Kendall tests return a named tuple which contained:

  • trend: tells the trend (increasing, decreasing or no trend)
  • h: True (if trend is present) or False (if the trend is absence)
  • p: p-value of the significance test
  • z: normalized test statistics
  • Tau: Kendall Tau
  • s: Mann-Kendal's score
  • var_s: Variance S
  • slope: Theil-Sen estimator/slope
  • intercept: intercept of Kendall-Theil Robust Line, for seasonal test, full period cycle consider as unit time step

sen's slope function required data vector. seasonal sen's slope also has optional input period, which by the default value is 12. Both sen's slope function return only slope value.

Dependencies

For the installation of pyMannKendall, the following packages are required:

Installation

You can install pyMannKendall using pip. For Linux users

sudo pip install pymannkendall

or, for Windows user

pip install pymannkendall

or, you can use conda

conda install -c conda-forge pymannkendall

or you can clone the repo and install it:

git clone https://github.com/mmhs013/pymannkendall
cd pymannkendall
python setup.py install

Tests

pyMannKendall is automatically tested using pytest package on each commit here, but the tests can be manually run:

pytest -v

Usage

A quick example of pyMannKendall usage is given below. Several more examples are provided here.

import numpy as np
import pymannkendall as mk

# Data generation for analysis
data = np.random.rand(360,1)

result = mk.original_test(data)
print(result)

Output are like this:

Mann_Kendall_Test(trend='no trend', h=False, p=0.9507221701045581, z=0.06179991635055463, Tau=0.0021974620860414733, s=142.0, var_s=5205500.0, slope=1.0353584906597959e-05, intercept=0.5232692553379981)

Whereas, the output is a named tuple, so you can call by name for specific result:

print(result.slope)

or, you can directly unpack your results like this:

trend, h, p, z, Tau, s, var_s, slope, intercept = mk.original_test(data)

Citation

Google Scholar Researchgate

If you publish results for which you used pyMannKendall, please give credit by citing Hussain et al., (2019):

Hussain et al., (2019). pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.. Journal of Open Source Software, 4(39), 1556, https://doi.org/10.21105/joss.01556

@article{Hussain2019pyMannKendall,
	journal = {Journal of Open Source Software},
	doi = {10.21105/joss.01556},
	issn = {2475-9066},
	number = {39},
	publisher = {The Open Journal},
	title = {pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.},
	url = {http://dx.doi.org/10.21105/joss.01556},
	volume = {4},
	author = {Hussain, Md. and Mahmud, Ishtiak},
	pages = {1556},
	date = {2019-07-25},
	year = {2019},
	month = {7},
	day = {25},
}

Contributions

pyMannKendall is a community project and welcomes contributions. Additional information can be found in the contribution guidelines.

Code of Conduct

pyMannKendall wishes to maintain a positive community. Additional details can be found in the Code of Conduct.

References

  1. Bari, S. H., Rahman, M. T. U., Hoque, M. A., & Hussain, M. M. (2016). Analysis of seasonal and annual rainfall trends in the northern region of Bangladesh. Atmospheric Research, 176, 148-158. doi:10.1016/j.atmosres.2016.02.008

  2. Conover, W.J., (1980). Some methods based on ranks (Chapter 5), Practical nonparametric statistics (2nd Ed.), John Wiley and Sons.

  3. Cox, D. R., & Stuart, A. (1955). Some quick sign tests for trend in location and dispersion. Biometrika, 42(1/2), 80-95. doi:10.2307/2333424

  4. Hamed, K. H., & Rao, A. R. (1998). A modified Mann-Kendall trend test for autocorrelated data. Journal of hydrology, 204(1-4), 182-196. doi:10.1016/S0022-1694(97)00125-X

  5. Helsel, D. R., & Frans, L. M. (2006). Regional Kendall test for trend. Environmental science & technology, 40(13), 4066-4073. doi:10.1021/es051650b

  6. Hipel, K. W., & McLeod, A. I. (1994). Time series modelling of water resources and environmental systems (Vol. 45). Elsevier.

  7. Hirsch, R. M., Slack, J. R., & Smith, R. A. (1982). Techniques of trend analysis for monthly water quality data. Water resources research, 18(1), 107-121. doi:10.1029/WR018i001p00107

  8. Jacquelin Dietz, E., (1987). A comparison of robust estimators in simple linear regression: A comparison of robust estimators. Communications in Statistics-Simulation and Computation, 16(4), pp.1209-1227. doi: 10.1080/03610918708812645

  9. Kendall, M. (1975). Rank correlation measures. Charles Griffin, London, 202, 15.

  10. Libiseller, C., & Grimvall, A. (2002). Performance of partial Mann-Kendall tests for trend detection in the presence of covariates. Environmetrics: The official journal of the International Environmetrics Society, 13(1), 71-84. doi:10.1002/env.507

  11. Mann, H. B. (1945). Nonparametric tests against trend. Econometrica: Journal of the Econometric Society, 245-259. doi:10.2307/1907187

  12. Sen, P. K. (1968). Estimates of the regression coefficient based on Kendall's tau. Journal of the American statistical association, 63(324), 1379-1389. doi:10.1080/01621459.1968.10480934

  13. Theil, H. (1950). A rank-invariant method of linear and polynominal regression analysis (parts 1-3). In Ned. Akad. Wetensch. Proc. Ser. A (Vol. 53, pp. 1397-1412).

  14. Yue, S., & Wang, C. (2004). The Mann-Kendall test modified by effective sample size to detect trend in serially correlated hydrological series. Water resources management, 18(3), 201-218. doi:10.1023/B:WARM.0000043140.61082.60

  15. Yue, S., & Wang, C. Y. (2002). Applicability of prewhitening to eliminate the influence of serial correlation on the Mann-Kendall test. Water resources research, 38(6), 4-1. doi:10.1029/2001WR000861

  16. Yue, S., Pilon, P., Phinney, B., & Cavadias, G. (2002). The influence of autocorrelation on the ability to detect trend in hydrological series. Hydrological processes, 16(9), 1807-1829. doi:10.1002/hyp.1095

pymannkendall's People

Contributors

ishtiak006 avatar kyleniemeyer avatar mmhs013 avatar nonstopaggropop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pymannkendall's Issues

References should have DOI's

It looks like none of the references have their DOI's listed - should be an easy fix. Let me know if you can't find any and I can track them down.

Add new functions

np.nanmedian issue in newer versions of numpy...

Hi,

I have tried to use this module and there was a problem in newer versions of numpy >1.9.0 there is no np.nanmedian() function hence I could not get it working until I changed the np.nanmedian() calls to np.median(). However, before doing that, just after the _preprcessing(x) function I have removed the NaN values from x to make the np.median() work:
x, c = __preprocessing(x)
x = x[np.isfinite(x)]
After a few tests it seems to be fine...

Best
M.

[JOSS review] A statement of need

Please state for what the software can be used more clearly in the project description. Maybe even include a plot of an example time-series.

Deprecated use of np.float in __preprocessing function.

Hi! I recently ran into an issue trying to use the original_test function on some time series data. Here is the error message:

in __preprocessing(x)
18 def __preprocessing(x):
---> 19 x = np.asarray(x).astype(np.float)
20 dim = x.ndim
22 if dim == 1:
...

AttributeError: module 'numpy' has no attribute 'float'.
np.float was a deprecated alias for the builtin float. To avoid this error in existing code, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I'm not sure if other people are having the same issue? I could change my numpy version, but this seems like a non-ideal fix to the issue. My current numpy version is 1.24.3. Thank you!

How to indicate the region in the Regional MK test?

I've read different explanations of the Regional Mann Kendall test, and for what I understand it's necessary to declare the location of each time series at least so to lump the S statistic together, just like with Seasonal there's an indication of the frequency of the season. I've seen that in the package the Regional MK test only has the data input. How does it work? I've read the READme, the examples and I can't figure it out. Thanks!

numpy > 1.23

anyone going to convert this to work with numpy 1.24? i can if no one else wants to.

[JOSS review] Install instructions

python setup.py install will not works without root access of most systems. This could confuse inexperienced users. Maybe add

  • sudo or #
  • or add prefix or local install option

Elaboration on output tuple parameters

Can the tuple parameters (viz. z, tau, s, var_s, slope, intercept) be elaborated in more detail?

It's not quite clear what exactly these are representing w.r.t. the algorithm outputs such that it could help one to justify mathematically the results.

Thanks.

[JOSS review] Automated tests

Please describe how a user can run the tests. Also add a more detailed description what the tests do.
Having only a quick look at your tests it seems like you just comparing values based on some example series. Please consider writing more robust unit tests. If you need more guidance on this feel free to ask.

Hi

Hi
I hope you are doing well,
How did you calculate tau for the seasonal MK?

Variance in correlated seasonal MK test

Hi,

I've been using your library to perform several MK tests and may have found a missing term in the function correlated_multivariate_test(). According to Hipel & McLeod (1994) for a correlated seasonal MK test the variance is computed as follows (Eq. 23.3.12):

image

Where the 2nd summation term of the equation is zero when no correlation is present in the time series. In the function correlated_multivariate_test() this 2nd summation is represented by the variable Gamma and computed as follows (Eq. 23.3.21):

image

But after that, the variance (var_s) for the correlated seasonal MK test is obtained with np.sum(Gamma) and the 1st summation term (which is included in the seasonal MK test without correlation) is not accounted for. Thus, shouldn't the variance in correlated_multivariate_test() include both terms (assuming no ties nor missing values):

image

Thank you in advance!

Aleks

[JOSS review] Functionality documentation

Please compile a documentation using Sphinx or a similar tool. Alternatively, extend the project description with a more detailed description of the method signatures. You already have nice docstrings. A more detailed overview without looking into the code would be nice.

Add new features

Add new features in the next version :

  • Add intercept in sen's slope results as well as other test results.
  • Add more missing value analysis methods.
  • Implement numba package to reduce runtime.

np.pad issue ?

Hello there! Thanks you very much for bringing Mann-kendall to the python ecosystem!

While trying to debug one of my dataset, I saw a strange behaviour for seasonnal data with number of data not being a multiple of the period.

import numpy as np
import pymannkendall as mk

# Generate toy dataset with 1 increase by period of 12 months
data = np.array([
                         5,  4, 3, 2, 1, 0,
    2, 3, 4, 5,  6,  7,  6,  5, 4, 3, 2, 1,
    3, 4, 5, 6,  7,  8,  7,  6, 5, 4, 3, 2,
    4, 5, 6, 7,  8,  9,  8,  7, 6, 5, 4, 3,
    5, 6, 7, 8,  9, 10,  9,  8, 7, 6, 5, 4,
    6, 7, 8, 9, 10, 11, 10,  9, 8, 7, 6, 5,
    7,
    ])

mk.seasonal_test(data, period=12)

It output the correct data:

Seasonal_Mann_Kendall_Test(trend='increasing', h=True, p=2.6336710590157963e-12, z=6.996007264651662, Tau=0.7222222222222222, s=130.0, var_s=340.0, slope=1.0, intercept=2.5416666666666665)

Put since I put some print for debugging in your code l204 as follow:

def seasonal_sens_slope(x_old, period=12):
    # ... blabla cut
    
    if x.ndim == 1:
        if np.mod(n,period) != 0:
            print(x)
            x = np.pad(x,(0,period - np.mod(n,period)), 'constant', constant_values=(np.nan,))
            print(x)

    # ... blabla cut

it is outputting the data before and after the padding, and instead of np.nan padding value, I have -9223372036854775808, which is strange:

[ 5  4  3  2  1  0  2  3  4  5  6  7  6  5  4  3  2  1  3  4  5  6  7  8
  7  6  5  4  3  2  4  5  6  7  8  9  8  7  6  5  4  3  5  6  7  8  9 10
  9  8  7  6  5  4  6  7  8  9 10 11 10  9  8  7  6  5  7]  # <- first print, input data
[                   5                    4                    3
                    2                    1                    0
                    2                    3                    4
                    5                    6                    7
                    6                    5                    4
                    3                    2                    1
                    3                    4                    5
                    6                    7                    8
                    7                    6                    5
                    4                    3                    2
                    4                    5                    6
                    7                    8                    9
                    8                    7                    6
                    5                    4                    3
                    5                    6                    7
                    8                    9                   10
                    9                    8                    7
                    6                    5                    4
                    6                    7                    8
                    9                   10                   11
                   10                    9                    8
                    7                    6                    5
                    7 -9223372036854775808 -9223372036854775808
 -9223372036854775808 -9223372036854775808 -9223372036854775808] # <- second print, after padding.

I'm not sure if it come down to my python version, numpy (1.19.5) or pymankendall (1.4.1), but be aware that this behavior exist!

It does not lead to computational error in my case for the sens slope because of the median things taking over this issue, but I don't know if it affect other statistic output.

there may be a problem with sen's slope?

Hi! Thank you for your M-K package!
you define the sen's estimator by this: d[idx : idx + len(j)] = (x[j] - x[i]) / (j - I). But when I checked Sen's paper (1968), I think the denominator should be "Time", that is: (x[j] - x[i])/(t[j] - t[i]).

If someone uses annual data to calculate sen's slope, then your definition is right

NaN output in 'Hamed and Rao Modified MK Test'

I'm running a modified MK test on a couple of short time series:

import pymannkendall as mk

series1 = [0.41625778, 0.40488883, 0.43044564, 0.44369687, 0.46613348, 0.44420775, 0.45091315, 0.48484614, 0.40252088, 0.43944978, 0.51613973, 0.49302274]
series2 = [0.35257984, 0.38692909, 0.39669828, 0.36296244,0.42035612,0.39374964, 0.41100085, 0.43182076, 0.40815853, 0.45394297, 0.41584767, 0.47399517]

print(mk.hamed_rao_modification_test(series1))
print(mk.hamed_rao_modification_test(series2))

And I'm running into a weird result with the second time series, where var_s has a negative value and z and p values become NaN as a result:

Modified_Mann_Kendall_Test_Hamed_Rao_Approach(trend='increasing', h=True, p=0.03352416523500468, z=2.125748992103599, Tau=0.48484848484848486, s=32.0, var_s=212.66666666666666, slope=0.0070362813636363625, intercept=0.40525276250000003)
/home/joao/.local/lib/python3.8/site-packages/pymannkendall/pymannkendall.py:99: RuntimeWarning: invalid value encountered in sqrt
  z = (s - 1)/np.sqrt(var_s)
Modified_Mann_Kendall_Test_Hamed_Rao_Approach(trend='no trend', h=False, p=nan, z=nan, Tau=0.6666666666666666, s=44.0, var_s=-8.737179487179525, slope=0.008305347500000004, intercept=0.36390027875)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.