GithubHelp home page GithubHelp logo

facebookresearch / kats Goto Github PK

View Code? Open in Web Editor NEW
4.8K 79.0 523.0 34.6 MB

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

License: MIT License

Python 58.63% Jupyter Notebook 41.10% Makefile 0.01% Batchfile 0.12% CSS 0.01% HTML 0.13%

kats's Introduction

Description

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis. Time series analysis is an essential component of Data Science and Engineering work at industry, from understanding the key statistics and characteristics, detecting regressions and anomalies, to forecasting future trends. Kats aims to provide the one-stop shop for time series analysis, including detection, forecasting, feature extraction/embedding, multivariate analysis, etc.

Kats is released by Facebook's Infrastructure Data Science team. It is available for download on PyPI.

Important links

Installation in Python

Kats is on PyPI, so you can use pip to install it.

pip install --upgrade pip
pip install kats

If you need only a small subset of Kats, you can install a minimal version of Kats with

MINIMAL_KATS=1 pip install kats

which omits many dependencies (everything in test_requirements.txt). However, this will disable many functionalities and cause import kats to log warnings. See setup.py for full details and options.

Examples

Here are a few sample snippets from a subset of Kats offerings:

Forecasting

Using Prophet model to forecast the air_passengers data set.

import pandas as pd

from kats.consts import TimeSeriesData
from kats.models.prophet import ProphetModel, ProphetParams

# take `air_passengers` data as an example
air_passengers_df = pd.read_csv(
    "../kats/data/air_passengers.csv",
    header=0,
    names=["time", "passengers"],
)

# convert to TimeSeriesData object
air_passengers_ts = TimeSeriesData(air_passengers_df)

# create a model param instance
params = ProphetParams(seasonality_mode='multiplicative') # additive mode gives worse results

# create a prophet model instance
m = ProphetModel(air_passengers_ts, params)

# fit model simply by calling m.fit()
m.fit()

# make prediction for next 30 month
fcst = m.predict(steps=30, freq="MS")

Detection

Using CUSUM detection algorithm on simulated data set.

# import packages
import numpy as np
import pandas as pd

from kats.consts import TimeSeriesData
from kats.detectors.cusum_detection import CUSUMDetector

# simulate time series with increase
np.random.seed(10)
df_increase = pd.DataFrame(
    {
        'time': pd.date_range('2019-01-01', '2019-03-01'),
        'increase':np.concatenate([np.random.normal(1,0.2,30), np.random.normal(2,0.2,30)]),
    }
)

# convert to TimeSeriesData object
timeseries = TimeSeriesData(df_increase)

# run detector and find change points
change_points = CUSUMDetector(timeseries).detector()

TSFeatures

We can extract meaningful features from the given time series data

# Initiate feature extraction class
import pandas as pd
from kats.consts import TimeSeriesData
from kats.tsfeatures.tsfeatures import TsFeatures

# take `air_passengers` data as an example
air_passengers_df = pd.read_csv(
    "../kats/data/air_passengers.csv",
    header=0,
    names=["time", "passengers"],
)

# convert to TimeSeriesData object
air_passengers_ts = TimeSeriesData(air_passengers_df)

# calculate the TsFeatures
features = TsFeatures().transform(air_passengers_ts)

Citing Kats

If you use Kats in your work or research, please use the following BibTeX entry.

@software{Jiang_KATS_2022,
author = {Jiang, Xiaodong and Srivastava, Sudeep and Chatterjee, Sourav and Yu, Yang and Handler, Jeffrey and Zhang, Peiyi and Bopardikar, Rohan and Li, Dawei and Lin, Yanjun and Thakore, Uttam and Brundage, Michael and Holt, Ginger and Komurlu, Caner and Nagalla, Rakshita and Wang, Zhichao and Sun, Hechao and Gao, Peng and Cheung, Wei and Gao, Jun and Wang, Qi and Guerard, Marius and Kazemi, Morteza and Chen, Yulin and Zhou, Chong and Lee, Sean and Laptev, Nikolay and Levendovszky, Tihamér and Taylor, Jake and Qian, Huijun and Zhang, Jian and Shoydokova, Aida and Singh, Trisha and Zhu, Chengjun and Baz, Zeynep and Bergmeir, Christoph and Yu, Di and Koylan, Ahmet and Jiang, Kun and Temiyasathit, Ploy and Yurtbay, Emre},
license = {MIT License},
month = {3},
title = {{Kats}},
url = {https://github.com/facebookresearch/Kats},
version = {0.2.0},
year = {2022}
}

Changelog

Version 0.2.0

  • Forecasting
    • Added global model, a neural network forecasting model
    • Added global model tutorial
    • Consolidated backtesting APIs and some minor bug fixes
  • Detection
    • Added model optimizer for anomaly/ changepoint detection
    • Added evaluators for anomaly/changepoint detection
    • Improved simulators, to build synthetic data and inject anomalies
    • Added new detectors: ProphetTrendDetector, Dynamic Time Warping based detectors
    • Support for meta-learning, to recommend anomaly detection algorithms and parameters for your dataset
    • Standardized API for some of our legacy detectors: OutlierDetector, MKDetector
    • Support for Seasonality Removal in StatSigDetector
  • TsFeatures
    • Added time-based features
  • Others
    • Bug fixes, code coverage improvement, etc.

Version 0.1.0

  • Initial release

Contributors

Kats is currentely maintaned by community with the main contributions and leading from Nickolai Kniazev and Peter Shaffery

Kats is a project with several skillful researchers and engineers contributing to it. Kats was started and built by Xiaodong Jiang with major contributions coming from many talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Sudeep Srivastava, Sourav Chatterjee, Jeff Handler, Rohan Bopardikar, Dawei Li, Yanjun Lin, Yang Yu, Michael Brundage, Caner Komurlu, Rakshita Nagalla, Zhichao Wang, Hechao Sun, Peng Gao, Wei Cheung, Jun Gao, Qi Wang, Morteza Kazemi, Tihamér Levendovszky, Jian Zhang, Ahmet Koylan, Kun Jiang, Aida Shoydokova, Ploy Temiyasathit, Sean Lee, Nikolay Pavlovich Laptev, Peiyi Zhang, Emre Yurtbay, Daniel Dequech, Rui Yan, William Luo, Marius Guerard, Pietari Pulkkinen, Uttam Thakore, Trisha Singh, Huijun Qian, Chengjun Zhu, Di Yu, Zeynep Erkin Baz, and Christoph Bergmeir.

License

Kats is licensed under the MIT license.

kats's People

Contributors

aadhar96 avatar ahmetburhanfb avatar axemen avatar bhavinashah-meta avatar bigfootjon avatar chonghaogoh avatar ckomurlufb avatar clumdee avatar davidlouie avatar facebook-github-bot avatar hqian2022 avatar iamxiaodong avatar igorsugak avatar irumata avatar itamaro avatar jakee417 avatar mariusguerard avatar michaelbrundage avatar mjt91 avatar mokazemi9 avatar mpolson64 avatar ourownstory avatar peiyizhang avatar proof-by-accident avatar r-barnes avatar rohanfb avatar timgasser avatar uthakore avatar wzcfb avatar yangbk560 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kats's Issues

Outlier Detection falied

ERROR:root:!! Traceback (most recent call last):
!!   File "C:\Users\sachi\anaconda3\lib\site-packages\kats\detectors\outlier.py", line 114, in detector
    outlier = self.__clean_ts__(ts)
!!   File "C:\Users\sachi\anaconda3\lib\site-packages\kats\detectors\outlier.py", line 88, in __clean_ts__
    result = seasonal_decompose(original, model=self.decomp)
!!   File "C:\Users\sachi\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 199, in wrapper
    return func(*args, **kwargs)
!!   File "C:\Users\sachi\anaconda3\lib\site-packages\statsmodels\tsa\seasonal.py", line 147, in seasonal_decompose
    raise ValueError('x must have 2 complete cycles requires {0} '
!! ValueError: x must have 2 complete cycles requires 104 observations. x only has 65 observation(s)

ERROR:root:Outlier Detection Failed

Make torch and some other dependencies optional

Hi, kudos to your work as this package will be quite useful. However, it was quite a bad surprise for me to add this package to my project dependencies and see huge and non-environment-friendly packages such as pytorch, gpytorch, llvmlite and even plotly being installed into my environment. I only plan to use some changepoint detection methods and maybe visualizations, and I highly doubt that (though, of course correct me if I am wrong) the whole torch stack is necessary for such a task. Similarly, I imagine plotly can easily be made an optional dependency as well, since matplotlib is still the de facto plotting library of python and it would not be in the interest of majority of the users to install a data analysis package and get a big python-js bundle with it.

How Kats is differ from Sktime?

Hi FB team,

Thank you for open-sourcing your great package.
I would like to know what is the difference between Kats and Sktime:

Sktime - A unified framework for machine learning with time series

Which features Kats offers, which are not included in Sktime?

Thank you,
Eli

Kats Tutorial 101_basics - Prophet predict method

Hi, first, Thanks for your awesome time-series analysis framework for everyone!
I am following your tutorial of Kats(101_basics) but I wondered what meaning of argument freq="MS" in predict method. What is the meaning of freq and how can I set the other freq parameter in addtition to "MS"?

Below, here is the screenshot of your tutorial, Thanks!
스크린샷 2021-07-24 오후 7 20 20

Installation breaks on Windows10 (conda environment)

Hi everyone,

When I try to install kats on my machine (Windows 10, build see attached screenshot), the installation breaks building the wheel for fbprophet.

I am trying this on a fresh conda environment (conda version 4.10.1) and python3.8. My only guess is that this has todo with the ability to install pystan on windows, which is not possible without access to WSL2 (which I do not have because of corporate restrictions).

I will attach the error message below.

Attachments

Windows version

image

Error message

Error Message (click)

ERROR: Command errored out with exit status 1:
   command: 'C:\Users\matheiss\Miniconda3\envs\tiresias-kats\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"'; __file__='"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\matheiss\AppData\Local\Temp\1\pip-wheel-xzjkrfhe'
       cwd: C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\
  Complete output (42 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib
  creating build\lib\fbprophet
  creating build\lib\fbprophet\stan_model
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 122, in <module>
      setup(
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\setuptools\__init__.py", line 163, in setup
      return distutils.core.setup(**attrs)
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\core.py", line 148, in setup
      dist.run_commands()
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\wheel\bdist_wheel.py", line 299, in run
      self.run_command('build')
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\command\build.py", line 135, in run
      self.run_command(cmd_name)
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
      cmd_obj.run()
    File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 48, in run
      build_models(target_dir)
    File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 38, in build_models
      StanBackendEnum.get_backend_class(backend).build_model(target_dir, MODEL_DIR)
    File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\fbprophet\models.py", line 209, in build_model
      import pystan
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\pystan\__init__.py", line 9, in <module>
      from pystan.api import stanc, stan
    File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\pystan\api.py", line 13, in <module>
      import pystan._api  # stanc wrapper
  ImportError: DLL load failed while importing _api: Das angegebene Modul wurde nicht gefunden.
  ----------------------------------------
  ERROR: Failed building wheel for fbprophet
  Running setup.py clean for fbprophet
Failed to build fbprophet
Installing collected packages: fbprophet, ax-platform, attrs, kats
    Running setup.py install for fbprophet ... error
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\matheiss\Miniconda3\envs\tiresias-kats\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"'; __file__='"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\matheiss\AppData\Local\Temp\1\pip-record-nj09haiu\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\matheiss\Miniconda3\envs\tiresias-kats\Include\fbprophet'
         cwd: C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\
    Complete output (44 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib
    creating build\lib\fbprophet
    creating build\lib\fbprophet\stan_model
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 122, in <module>
        setup(
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\setuptools\__init__.py", line 163, in setup
        return distutils.core.setup(**attrs)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\setuptools\command\install.py", line 61, in run
        return orig.install.run(self)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\command\install.py", line 545, in run
        self.run_command('build')
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\command\build.py", line 135, in run
        self.run_command(cmd_name)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 48, in run
        build_models(target_dir)
      File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\setup.py", line 38, in build_models
        StanBackendEnum.get_backend_class(backend).build_model(target_dir, MODEL_DIR)
      File "C:\Users\matheiss\AppData\Local\Temp\1\pip-install-5dy9pwq7\fbprophet_6fb4ff087aaf40478b71089d7e634e82\fbprophet\models.py", line 209, in build_model
        import pystan
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\pystan\__init__.py", line 9, in <module>
        from pystan.api import stanc, stan
      File "C:\Users\matheiss\Miniconda3\envs\tiresias-kats\lib\site-packages\pystan\api.py", line 13, in <module>
        import pystan._api  # stanc wrapper
    ImportError: DLL load failed while importing _api: Das angegebene Modul wurde nicht gefunden.
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'C:\Users\matheiss\Miniconda3\envs\tiresias-kats\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"'; __file__='"'"'C:\\Users\\matheiss\\AppData\\Local\\Temp\\1\\pip-install-5dy9pwq7\\fbprophet_6fb4ff087aaf40478b71089d7e634e82\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\matheiss\AppData\Local\Temp\1\pip-record-nj09haiu\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\matheiss\Miniconda3\envs\tiresias-kats\Include\fbprophet' Check the logs for full command output.

SyntaxError: future feature annotations is not defined in import kats

Hi,
I am using MAC and python3.6.
Installed Kats using pip and when trying:
import kats

I get the error:

Traceback (most recent call last):
  File "/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-f96c9197f95c>", line 1, in <module>
    import kats
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "venv/lib/python3.6/site-packages/kats/__init__.py", line 1, in <module>
    from . import consts   # noqa
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "venv/lib/python3.6/site-packages/kats/consts.py", line 20
    from __future__ import annotations
                                     ^
SyntaxError: future feature annotations is not defined

Any idea?

Issue when trying to instantiate BackTesterSimple object using TimeSeriesData object with repeating minute bars on separate days

Hi there,

When running some of the code from the backtesting section of the tutorials as seen below,

`params = ARIMAParams(p=2, d=1, q=1)
ALL_ERRORS = ['mape', 'smape', 'mae', 'mase', 'mse', 'rmse']

backtester_arima = BackTesterSimple(
error_methods=ALL_ERRORS,
data=air_passengers_ts,
params=params,
train_percentage=75,
test_percentage=25,
model_class=ARIMAModel)

backtester_arima.run_backtest()`

there is an issue if you try to replace the air_passengers_ts data with a TimeSeriesData object which contains minute bars that repeat on separate days. For instance, the 'time' column of some new TimeSeriesData object X is of the format 2012-01-02 09:30:00 for row 1 (or rather, x_1) and of the format 2012-01-02 09:31:00 for row 2 (or rather, x_2).

When trying to instantiate a BackTesterSimple object using this data, it succeeds if there is only 1 day's worth of minute bar data. However, if we also have a second day's worth of data with the format 2012-01-03 09:30:00 (x_392) and 2012-01-03 09:31:00 (x_393) and so on... up until the end of the end of the same sequence of minute bars as the previous day, then the code fails with error:

ValueError: No frequency information was provided with date index and no frequency could be inferred.

Use running index as time_col

I have time series data with the time_col is the index of the dataframe:

df = [index  value 
            0         31
            1          22
            2         15
            3         77]

When I am trying to convert it to TimeSeriesData, the index is automatically transform to epoch time. (e.g. 1970-01-01 00:00:00.000000095)
Is there a way to keep the time_col as the mere index when using TimeSeriesData

Very inflexible plotting - allow instead for an axis as argument

The way some of the plotting is done right now is very limiting.
For instance kats/detectors/cusum_detection.py ends with a plt.show().

It would be better to allow the caller to manage the figure itself and pass an axis (as in most typical graphing utilities). This way one could build composite figures or decide where to plot (to an html file for instance).

Add strategies for positive predictions (e.g. ProphetPos support)

In order to force predictions to be positive (this is a common requirement for many business-related time series), the prophet package can be used with the ProphetPos class (see "Approach 5" here: facebook/prophet#1668 (comment)).

It would be nice to add a "ProphetPosModel" to Kats, so we can force predictions to be positive when we use prophet in Kats.

Maybe a positive constraint could also be added to the other models supported by Kats (SARIMA etc.).

Documentation

Hi,

I saw the examples about seasonality_strength in the tutorials.
I went over the source code and it was a bit hard to search until I found these references:
https://stats.stackexchange.com/questions/485012/measuring-strength-of-trend-and-seasonalities-for-time-series-presenting-multi-s
https://otexts.com/fpp2/seasonal-strength.html

I would like some explanation added on how seasonality/trend strength is calculated.
If I can get some resources/guidance on where such document needs to be written I would love to create a PR.

Will it suffice to add some explanation in the example notebooks or should it be documented elsewhere as well?

Install Kats with CMDSTANPY instead of pystan

Hello, as for prophet it would more than nice to have the ability to avoid installing the pystan software.

e.g. for fbprophet we can proceed as follows:
CMDSTAN=/tmp/cmdstan-2.22.1 STAN_BACKEND=CMDSTANPY pip install prophet

Would that be possible to install Kats the same way ? And to document in the readme ?

ImportError

Hello!

When I was trying to import some kats functionalities on my Jupyter Notebook, it came back with a error message as below:

ImportError Traceback (most recent call last)
in
----> 1 from kats.detectors.outlier import OutlierDetector

~\Anaconda3\lib\site-packages\kats_init_.py in
1 from . import consts # noqa
----> 2 from . import utils # noqa
3 from . import detectors # noqa
4 from . import models # noqa
5 from . import tsfeatures # noqa

ImportError: cannot import name 'utils' from partially initialized module 'kats' (most likely due to a circular import) (C:\Users\49683\Anaconda3\lib\site-packages\kats_init_.py)

Anyone knows how to resolve this issue? Thanks in advance!

Minor style problem in homepage (gh-page)

Hi there!

I was wondering to contribute and fix a minor style problem in homepage.

The index has a main section tag with a "features_3azU" class. This section has 2 rows:

  • the first one is a generic description of the package;
  • the second one contains all the 4 features divided by a col col--3 class.

For the styling I think could be better to put the description paragraph inside a col col--12 class.
In this way this section will have a 16px right and left padding and will be better aligned to the features in the following row.

Kats downgrades fbprophet to 0.7.0

Why does Kats has a fix dependency to fbprophet 0.7 ?

It downgraded our version from 0.7.0 to to 0.7.1 and therefore reverted our backend to pystan instead of cdmstanpy.

Adding Regressors to ProphetModel

Hi all,

to my knowledge and research within the kats modules it is not possible to add additional regressors to a ProphetModel as of now, correct?

I am refering to this functionality: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors

Will there be a possibility to add such functionality to kats implementation of Prophet?
I am think of something similar to custom_seasonalities like here:

custom_seasonalities: Optional[List[Dict]] = None,

Adding a new optinal parameter to the constructor e.g. additional_regressors: Optional[pd.DataFrame]` with the same time and the regressors columns.

Would appreciate some feedback on this idea.

Model updates

Hi all,

thanks for a great package. Sorry to ask a question here (not necessarily and issue yet) but can't find the users group

I have a quick question, is it possible to update a Prophet model as new data arrives or is retraining from scratch the only alternative?

Say you have a df like this

from kats.models.prophet import ProphetModel, ProphetParams
from kats.consts import TimeSeriesData
import pandas as pd
import numpy as np

df = pd.DataFrame({"time": pd.date_range('2020-01-01', '2020-01-21'), "value": np.random.randn(21)})
ts = TimeSeriesData(df)

model = ProphetModel(ts, ProphetParams())
model.fit()

and latter on we add another datapoint to df as in

df = df.append({"time": pd.to_datetime('2020-01-22'), "value": 0}, ignore_index=True)

I'm assuming that it would be possible and efficient to just update the model

Thanks for reading

Error while importing CUSUMDetectorTest in tutorial tkats_202_detection.ipynb

I get the error below while importing CUSUMDetectorTest in Tutorial "Daily Seasonality with Interest Windows" (tkats_202_detection.ipynb)

I solved the issue by copying the CSV file in the subfolder kats/kats/data of my running notebook folder

It looks like a bug in test_detectors.py?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-11-6570ee1979fa> in <module>
----> 1 from kats.tests.test_detectors import CUSUMDetectorTest
      2 
      3 # Here's where we load the seasonal data
      4 np.random.seed(1)
      5 periodicity=24

~\AppData\Local\Continuum\anaconda3\lib\site-packages\kats\tests\test_detectors.py in <module>
    105     multi_data_path = "kats/kats/data/multivariate_anomaly_simulated_data.csv"
    106 
--> 107 data = pd.read_csv(data_path)
    108 data.columns = ["time", "y"]
    109 ts_data = TimeSeriesData(data)
.
.
.
FileNotFoundError: [Errno 2] No such file or directory: 'kats/kats/data/air_passengers.csv'

CUSUMDetector and RobustStatDetector error out when using TimeSeriesData created with series that don't have a name.

The CUSUMDetector and the RobustStatDetector both reference the name of the TimeSeriesData value column and have errors when the series name is not set.

The series name can be not set through initializing TimeSeriesData without using a DataFrame

import pandas as pd
import numpy as np
from kats.consts import TimeSeriesData

n = 10
time = pd.Series(pd.date_range(start='2018-01-01', periods=n, freq='D'))
value = pd.Series(np.arange(n))
ts = TimeSeriesData(time=time, value=value)

print("ts.value.name", ts.value.name) # prints None
print("ts.time.name", ts.time.name) # prints None

This causes both to error out when attempting to plot and the RobustStatDetector to error out when detecting change points.

For calling the .plot() method on either the error is:

Traceback (most recent call last):
  File "/home/jrand/workspace/kats/.venv/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: None

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "error.py", line 19, in <module>
    detector.plot(change_points)
  File "/home/jrand/workspace/kats/kats/detectors/cusum_detection.py", line 565, in plot
    plt.plot(data_df[time_col_name], data_df[val_col_name])
  File "/home/jrand/workspace/kats/.venv/lib/python3.8/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/jrand/workspace/kats/.venv/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: None

When calling the RobustStatDetector.detector() the error is:

Traceback (most recent call last):
  File "error.py", line 22, in <module>
    change_points = detector.detector()
  File "/home/jrand/workspace/kats/kats/detectors/robust_stat_detection.py", line 58, in detector
    data_df = data_df.set_index(time_col_name)
  File "/home/jrand/workspace/kats/.venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/jrand/workspace/kats/.venv/lib/python3.8/site-packages/pandas/core/frame.py", line 5446, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: 'None of [None] are in the columns'

Both of these issues can be fixed fairly easily.

Checking if the values are named and setting a default if they are not. This can be done by adding the following code to each of their __init__ functions.

if self.data.value.name is None:
    self.data.value.name = DEFAULT_VALUE_NAME

if self.data.time.name is None:
    self.data.time.name = DEFAULT_TIME_NAME

Or they can be refactored to not use the names at all. As both are univariate only detectors we can refer to the columns simply as self.data.time and self.data.value without the need for using the column name specifically.

Timeseries sampled every second (and sub-second)?

My data (Earth's electromagnetic field components) is typically sampled with a rate 1-100 Hz, with a daily periodicity (following the solar cycle). For the "long period" (1 Hz, i.e. one sample per second) the script detectors/outlier.py makes an error and incorrectly infers that the period is "S" (i.e., a second) and passes "S" to the tsatools module, which can handle only hourly ("H") to annual ("A") periods. Hence, the execution crashes. On the other hand, if I change the line 88 from:
result = seasonal_decompose(original, model=self.decomp)
to
result = seasonal_decompose(original, model=self.decomp, freq=86400)
it runs fine.
Would it be possible to make a provision for the user to supply their own freq value?

Multivariate Anomaly Dectector (Error when running tutorial)

I get the error below when I run the tutorial kats_202_detection.ipynb https://github.com/facebookresearch/Kats/blob/master/tutorials/kats_202_detection.ipynb
Any clue ?

KeyError: Timestamp('2019-12-23 23:59:58.142906')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-265-b73347c84449> in <module>
      3 d = MultivariateAnomalyDetector(multi_anomaly_ts, params, training_days=60)
      4 display(params)
----> 5 anomaly_score_df = d.detector()
      6 
      7 d.plot()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\kats\detectors\outlier.py in detector(self)
    300         while fcstTime < self.df.index.max():
    301             # forecast for fcstTime+ 1
--> 302             pred_df = self._generate_forecast(fcstTime)
    303             # calculate anomaly scores
    304             anomaly_scores_t = self._calc_anomaly_scores(pred_df)

Add STUMPY Matrix Profile Analysis

Thank you for putting together this really awesome time series analysis package. Given that the aim of Kats is to provide a one-stop-shop for time series analysis, I was wondering if there are any plans/interest to incorporate time series analyses with matrix profiles using the STUMPY package. Given that STUMPY only depends on NumPy, SciPy, and Numba (all of which are Kats dependencies) and it is has 100% code coverage, it may be a great complement to the Kats ecosystem.

Full disclosure, I am the creator and core maintainer of the STUMPY package. Thank you for your consideration!

MKDetector works only for daily data - Support of montlhy data would be great

I have tried running MKDectector on monthly values, but the algorithm is obviously limited to daily data although this limitation is not mentioned (for instance, CUSUM works on monthly data).

What a pity not to be able to process the monthly data. I would like you to add this functionality, in particular concerning the treatment of hydroclimatic data for the challenges of climate change.

Thank you for sharing this great tool

Outlier detection failed: issue with reading timestamps?

I have a pandas dataframe dat with two columns, the first one having timestamps, named 'time', and the second one containing the data values, named 'value'. The timestamps have a frequency of 5 min i.e. the values there are of the type "2018-01-01 00:00:00", "2018-01-01 00:05:00", "2018-01-01 00:10:00", etc.

I invoked kats.TimeSeriesData() on this dataset, then apply outlier detection.
Code run by me:
dat['time'] = pd.to_datetime(dat['time']) #converting timestamp column to pandas datetime
dat_ts = TimeSeriesData(dat)
from kats.detectors.outlier import OutlierDetector
ts_outlierDetection = OutlierDetector(dat_ts, 'additive') # call OutlierDetector
ts_outlierDetection.detector()

I get the following error on running the last line.

ERROR:root:!! Traceback (most recent call last):
!! File ".../kats/detectors/outlier.py", line 114, in detector
outlier = self.__clean_ts__(ts)
!! File ".../kats/detectors/outlier.py", line 88, in __clean_ts__
result = seasonal_decompose(original, model=self.decomp)
!! File ".../pandas/util/_decorators.py", line 207, in wrapper
return func(*args, **kwargs)
!! File ".../statsmodels/tsa/seasonal.py", line 140, in seasonal_decompose
pfreq = freq_to_period(pfreq)
!! File ".../statsmodels/tsa/tsatools.py", line 829, in freq_to_period
raise ValueError("freq {} not understood. Please report if you "
!! ValueError: freq T not understood. Please report if you think this is in error.
ERROR:root:Outlier Detection Failed

I looked into line 829 of tsatools.py and it seems it can only support few timestamp formats and hence not recognizing the 5 min frequency in my timeseries data? Should I be setting something like freq='5min' somewhere in the codes?

I also ran the outlier detection code on daily data, i.e. freq = 'D', and there seems to be no issues (this is essentially the example in the tutorial).

Refining docs around hyperparameter (grid) search

First of all, great package --- kudos to the team that created it!

I was trying to adopt the grid search code in tutorial 201 , which introduces forecasting, for my own needs. In particular, I ran into two problems --- the resolution of which may benefit from more comprehensive docs.

As an example, here is how I defined the search space for tuning the hyperparameters of VARModel.

parameters_grid_search = [
    {
        "name": "maxlags",
        "type": "choice",
        "values": list(range(1, 14)),
        "value_type": "int",
        "is_ordered": True
    },
]

First, it is unclear what are the valid options for type, and what different effects they have. Second, it is unclear that values have to be a list, not e.g., a tuple --- until I ran into an AssertionError.

The first one could really use input from people familiar with kats' tuning backend. The second perhaps can be remedied by adding a note in the tutorial notebook and/or API docs; would anyone mind me making a pull request?

CI fails missing requirement

I saw the CI failing because of a library called deprecated which is missing from the requirements.txt

Made a PR that adds the requirement.

Should these requirements be separate from the requirements.txt and create a test_requirements.txt or all of the dependencies for testing and development are stated at requirements.txt?

unable to build wheel for fbprophet while trying to install Kats in Win 10

  1. My environment is Win 10, with conda 4.9.2 and Python 3.7.6
  2. While trying to install Kats (pip install kats), I am getting the error: Building wheel for fbprophet (setup.py) ... error
  3. More lines from the error message are in attached file
    Kats_error.txt
  4. I had an older working fbprophet, faced same errors in trying to upgrade (pip install --upgrade fbprophet).
  5. Anybody know how to fix this?

Feature request: __getitem__ returning univariate TimeSeriesData

Hello,
First of all thanks for this amazing work! It's the lib I was looking for for so long.
Manipulating multivariate time series can still be quite frustrating.
Have you plan to add getitem method to TimeSeriesData object to enable pandas-like column access df["column_name"] ?
It could be especially convenient to get a univariate TimeSeriesData object ready to be used in Kats models.

Meanwhile any workaround more convinient than the following would be welcome:

uni_ts = TimeSeriesData(multi_ts.value["col_name"])

Best regards,

Can't be Kats Multivariate forecasting only one target(Y) variable?

Hi. thanks for your awesome project!
I am following your tutorial and learning Kats library for multi-variate forecasting. But in the tutorial(201_forecasting.ipynb), I learned multi-variate forecasting(VAR model in tutorial) and I have a question about this. As usual, multi-variate forecasting means to predict only one target value(Y variable) using multi features(X variables). But this tutorial doesn't provide me with the multi-variate forecasting that I want. This tutorial(VAR model) say me that this is multi-variate forecasting but it has two X features and finally predicts two Y target values. How can I predict multi-variate forecasting that I want? Or VAR model can't be used to predict only one Y target value using more than two X features?

스크린샷 2021-07-25 오후 1 24 16

BOCPD errors when choose_priors=True

I'm experimenting with BOCPD for regime and trend detection. In my case, the call to the detector function results in the following error when called with choose_priors=True. I'm running the detector on a univariate timeseries of length 280.

changepoints = detector.detector(model=BOCPDModelType.TREND_CHANGE_MODEL, choose_priors=True, lag=7, threshold=0.8)

results in

ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)

Changing choose_priors=False results in no errors.

I'd appreciate your help. Maybe there is something wrong with calling the detector function.

Publish bdist_wheel alongside sdist?

Would you consider publishing a wheel alongside the source distribution? I understand Kats is Python source-only, but it would help to publish a *-py3-none-any.whl file using bdist_wheel.

the error of running tutorial example kats_101_basics.ipynb

When running the kats_101_basics.ipynb, it has an error shown as following

from dateutil import parser
from datetime import datetime

# Convert time from air_passengers data to unix time
air_passengers_ts_unixtime = air_passengers_df.time.apply(
    lambda x: datetime.timestamp(parser.parse(x))
)

air_passengers_ts_unixtime
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-28-38e65b10d87c> in <module>
      4 # Convert time from air_passengers data to unix time
      5 air_passengers_ts_unixtime = air_passengers_df.time.apply(
----> 6     lambda x: datetime.timestamp(parser.parse(x))
      7 )
      8 

~\Anaconda3\envs\fb\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   4198             else:
   4199                 values = self.astype(object)._values
-> 4200                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4201 
   4202         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-28-38e65b10d87c> in <lambda>(x)
      4 # Convert time from air_passengers data to unix time
      5 air_passengers_ts_unixtime = air_passengers_df.time.apply(
----> 6     lambda x: datetime.timestamp(parser.parse(x))
      7 )
      8 

OSError: [Errno 22] Invalid argument



LinAlgError: Incompatible dimensions

from kats.models.theta import ThetaModel, ThetaParams
params = ThetaParams(m=7) # Weekly seasonality 
m = ThetaModel(data_ts, params) # data_ts -> daily data points
m.fit()

gives this error:

Optimization failed to converge. Check mle_retvals.

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
/var/folders/25/t45ty5ps70l5y9gww_k11zy40000gq/T/ipykernel_44596/687703454.py in <module>
----> 1 m.fit()

~/kats/lib/python3.9/site-packages/kats/models/theta.py in fit(self, **kwargs)
    119         # creating x and intercept variables to fit a straight line
    120         regr = np.vstack([np.arange(self.n), np.ones(self.n)]).T
--> 121         slope, _ = np.linalg.lstsq(regr, deseas_data.value.values)[0]
    122         # pyre-fixme[16]: `ThetaModel` has no attribute `drift`.
    123         self.drift = slope / 2

<__array_function__ internals> in lstsq(*args, **kwargs)

~/kats/lib/python3.9/site-packages/numpy/linalg/linalg.py in lstsq(a, b, rcond)
   2273     m2, n_rhs = b.shape[-2:]
   2274     if m != m2:
-> 2275         raise LinAlgError('Incompatible dimensions')
   2276 
   2277     t, result_t = _commonType(a, b)

LinAlgError: Incompatible dimensions

Ran the same data under Prophet model and it fits and forecasts w/o errors. Any pointers as to the source of the problem?

tsfeatures.TsFeatures.transform() produces nan values

the TsFeatures.transform() seems to produce some nan values when using all feature groups , for the Holt-Winters features and the changepoint detection features

An example of the error can be found at https://colab.research.google.com/drive/13t3Gy9vazkeb8J32Zv-dijuww03GDbE7?usp=sharing, using the air passengers dataset from https://github.com/facebookresearch/Kats/blob/master/kats/data/air_passengers.csv, and stock price data from the yfinance library

Hyperparmeter tuning is not working

Hi,

I am following the steps as requested. I can build models like arima, holter, prophet but I cannot do hyper parameter tuning. I receive error: No frequency information was provided with date index and no frequency could be inferred. Then, I set freq='MS' but I still receive the same error. Any clue?

Thanks,
faye

Other Software

I know it is early days, but the package seems to borrow from other solutions already out there, maybe some level of attributions would be good.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.