nixtla / statsforecast Goto Github PK

View Code? Open in Web Editor NEW

3.6K 36.0 241.0 181.97 MB

Lightning ⚡️ fast forecasting with statistical and econometric models.

Home Page: https://nixtlaverse.nixtla.io/statsforecast

License: Apache License 2.0

Python 99.78% Makefile 0.17% Dockerfile 0.04% Shell 0.01%

time-series statistics forecasting arima econometrics machine-learning python exponential-smoothing ets baselines

statsforecast's Introduction

Nixtla

Forecast using TimeGPT

Nixtla offers a collection of classes and methods to interact with the API of TimeGPT.

🕰️ TimeGPT: Revolutionizing Time-Series Analysis

Developed by Nixtla, TimeGPT is a cutting-edge generative pre-trained transformer model dedicated to prediction tasks. 🚀 By leveraging the most extensive dataset ever – financial, weather, energy, and sales data – TimeGPT brings unparalleled time-series analysis right to your terminal! 👩‍💻👨‍💻

In seconds, TimeGPT can discern complex patterns and predict future data points, transforming the landscape of data science and predictive analytics.

⚙️ Fine-Tuning: For Precision Prediction

In addition to its core capabilities, TimeGPT supports fine-tuning, enhancing its specialization for specific prediction tasks. 🎯 This feature is like training a machine learning model on a targeted data subset to improve its task-specific performance, making TimeGPT an even more versatile tool for your predictive needs.

🔄 `Nixtla`: Your Gateway to TimeGPT

With Nixtla, you can easily interact with TimeGPT through simple API calls, making the power of TimeGPT readily accessible in your projects.

💻 Installation

Get Nixtla up and running with a simple pip command:

pip install nixtla>=0.4.0

🎈 Quick Start

Get started with TimeGPT now:

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')

from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)
fcst_df = nixtla_client.forecast(df, h=24, level=[80, 90])

statsforecast's People

Contributors

Stargazers

Watchers

Forkers

valeman elephann overfittingstudyroom dipsingh sugatoray mindis chetanmehra mergenthaler kdgutier milan-chicago codemonster808 manuelmusngi renesugar ioanszilagyi praveen686 germ1215 dliofindia techthiyanes desolatetraveller hiimanshusherawat willcline octaviodeliberato anasofiavazquez aiwalter donjon86 webclinic017 alberto1971 rohitpandey13 python-repository-hub thomasbourgeois akashmavle5 uvnikgupta guerda anhmike fdoperezi goodwanghan rishirelan ryanrussell nanaakwasiabayieboateng nova-ch marlops romakoks mhdella ennosigaeon ansari1375 bjeffrey92 quantfns lx12633036 psliving innovationexploited restevesd luthfianto markhng525 alextooter spread0x andreped nitin-mane tkforks eduardoarroyogil nanderoo aldomendez jattenberg drozturk desaivaibhav95 ssahgal chris-22ozor eliasca93 shariq101 sshyran spkdroid geraldotoledo patryknextdoor jafarijason junpenglao alice202108 objectin borishouenou ulen2000 nicolassoarespinto jeroenpeterbos firobeid sunone5 creative-research-project-v1-1 osmandolu mhmdsab jimmy-inl tdl77 jordan-sparq jcoffi jfogelberg axeltorbenson bladesaber hughes-research stjordanis longshen931 javiervicho pao94 clairvoyant tuxracer krishnamenon22

statsforecast's Issues

ValueError: math domain error (difference between python and R)

When I tried to run auto_arima I got a math domain error error inside armafn (inside arima) due to a negative s2.

return 0.5 * (math.log(s2) + res[1] / res[2])

R returns NaN when log is used on a negative value, so they don't have this problem.
A possible solution would be:

if s2 < 0:
     return math.nan

model update feature with additional new data point

Hi ,
do we have, additional data point update feature, the way we have in pdarima or other arima libraries.

I tried searching but could not find it

[FEAT] Use Dask and Spark clusters

Is your feature request related to a problem? Please describe.
Scale StatsForecast using Dask and Spark clusters.

Describe the solution you'd like
Include fugue as a backend just as we did with Ray.

Division by zero (python and R difference)

When I tried to run auto_arima I got a division by zero error inside armafn (inside arima):

s2 = res[0] / res[2]

R returns Inf when a divison by zero happens, so they don't have this problem.
A possible solution would be:

if res[2] == 0.:
     return math.inf

Creation of forecast dates takes too long

Describe the bug
Creation of forecast dates takes too long.
Computation is wasted when last_dates of all time series are the same (expected in most applications).

To Reproduce

Possible solution

Desktop (please complete the following information):

OS: Mac
Version 0.5.3

Add update feature

Thanks a lot! I look forward to them.

Another valuable feature to add is .update(). Then we don't need to re-train the models with hourly observations until the models get outdated.

Originally posted by @tuttoaposto in #71 (reply in thread)

Inconsistent approximation default for Auto ARIMA

statsforecast/statsforecast/models.py

Line 238 in b4b59c9

approximation: bool = False) -> np.ndarray:

approximation is set to False by default in auto_arima(), but auto_arima_f()'s default handles this logic similarly to R's auto.arima() already:

statsforecast/statsforecast/arima.py

Line 1438 in b4b59c9

approximation = len(x) > 150 or period > 12

Problem with Croston model

Error with `n_jobs > 0`

Trying to run the colab example using n_jobs >0 I got the following error.

The problem is related to the _data_parallel_forecast method of StatsForecast.

Stability of the API

Hi, great work on statsforecasts! This package looks very nice.
I'm considering integrating a couple of the models in Darts (https://github.com/unit8co/darts). I'm wondering about your future plans - do you intend to maintain this package on the long term? How likely can we expect API changes in the future releases?

Also as a side note - I took a quick look at the Croston method, and it looks like the method accepts h and future_xreg, which I'm not sure is intended as those are not used.

In general I think slightly more documentations on your different models could be helpful for users :)

In sample predictions, confidence interval and extract fitted params for Auto ARIMA

Hi,

Can you point me to an example where an auto arima model is fitted and the following are generated ?

In sample predictions
Confidence or Prediction interval (for both in sample and out sample)
extracting fitted params (p, d, q etc)

Thanks!

[FEAT] Add n_windows for cross validation tasks

Is your feature request related to a problem? Please describe.
Now, the cross_validation method receives test_size, but it is unintuitive. I would like to have n_windows.

Describe the solution you'd like
Preserve test_size while adding n_windows and step_size. Both parameters should be mutually exclusive.

Describe alternatives you've considered
Since n_windows = test_size - horizon + 1 (assuming step_size=1), the inclusion of the parameter should be easy.

exogenous variables on auto_arima

Hi there,

is it possible to add an exogenous variable to auto_arima? I can only see four parameters: y, h, season_length, and approximation.
If yes, would you please show me how ?

Thanks

Yassine

No xreg argument on forecast method

Hi, thanks for the package! I simply cannot reproduce the error because of the peculiar error: forecast() got an unexpected keyword argument 'xreg'. I installed the package from pip for Python3.8. Package version is 0.3.0. I am on a M1 Mac but I am not currently using arm64 version.

I also checked the module with inspect package. It really does not have the "xreg" argument in forecast method. Here are the code and response.

import inspect
from statsforecast import StatsForecast
inspect.getfullargspec(StatsForecast.forecast)

The response is FullArgSpec(args=['self', 'h'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})

p.s. I just installed from Github and xreg seems to be there. Probably it will work, too.

Compute residuals

I'm currently trying to perform some forecastings on a set of daily time series and I was wondering whether is there a way to get the predictions on the training data, that are used to compute the residuals (difference between actual and predictions in the train). In StatsForecast class there is no possibility for doing that. I'm mainly interested to obtain them with auto_arima approach, but it could be extended also for the remaining approaches.

Is it possible to add a method or attribute to get them?

Thank you

Include out-of-the-box calendar variables

Some people in our community slack have asked for calendar variables like the ones available in Prophet.

Typos in arima.ipynb

Several typos in the notebook :

'Would nn autorregresive '
'testing purporses,'
'will let auto_arima to handle'

Fitting AutoARIMA on constant time series causes TypeError

Version: v0.5.5

Description
When fitting an instance of AutoARIMA with default parameters on a constant time series, the Arima function gets called with the keyword argument "fixed", but this argument is not specified in the function.

To Reproduce

from statsforecast.arima import AutoARIMA
import numpy as np

AutoARIMA().fit(np.array([1]*36))

Expected behaviour
AutoARIMA should call the Arima function only with available arguments (see function signature below):

def Arima(
    x,
    order=(0, 0, 0),
    seasonal={'order': (0, 0, 0), 'period': 1},
    xreg=None,
    include_mean=True,
    include_drift=False,
    include_constant=None,
    blambda=None,
    biasadj=False,
    method='CSS',
    model=None,
)

Screenshots

[FEAT] Add `input_size` to `forecast` method

Is your feature request related to a problem? Please describe.
The cross_validation method has the argument input_size; forecast should also have it.

ValueError: math domain error

I get ValueError: math domain error
cause by
tmp['bic'] = tmp['aic'] + npar*(math.log(nstar) - 2)
from statsforecast/arima.py", line 1225,

I guess nstar is not > 0

Permissive License

Currently the library has a restrictive copyleft license. This prevents it from being added to other libraries. See discussion here. Would it be possible to change the license to a more permissive one like MIT or BSD?

GPL Copyleft

BSD or MIT might be better alternatives: https://en.wikipedia.org/wiki/BSD_licenses

[FEAT] Add prediction intervals for cross validation

Is your feature request related to a problem? Please describe.
The cross_validation method should include a level parameter to compute prediction intervals.

StatsForecast's `compute_forecasts` unintuitive tuples (model, *args)

I have been trying to explore the use of an auto_arima with a filtered hyperparameter space.

StatsForecasts' calls to the compute_forecasts method are fairly unintuitive. Leveraging the order of unnamed hyperparameters in a tuple to define different models is just weird.

Would it be better for the hyperparameters' control and visibility to define the model with a partial function with the fixed hyperparameters?

Zero division error `getQ0`

ARIMA model parameters

While AutoARIMA from pmdarima is slow, it shows the model hyperparameters in a convenient manner. How can I get the same for statsforecast AutoARIMA?

A consistent interface with pmdarima (as much as possible) would be appreciated so it can be a drop in replacement.

Questions about auto_arima()

Discussed in #71

^{Originally posted by tuttoaposto March 8, 2022}

Is it possible to get the bestfit model from the auto_arima_f() step in auto_arima()? It would be nice to get the same level of details as in model.arima_res_.params and model.summary() in pmdarima.
Is it possible to enable setting max_p, max_q , etc in auto_arima()?

Thank you!

[question] Division by Zero error

When running on many groups of time series, some groups are giving me a 'division by zero' error, and the script stops.
Is there a way to pass through this error and complete the forecasts without errors?

Nondaily time series breaks

Using non-daily time series throws the following error:

The minimum reproducible example would be,

import numpy as np
import pandas as pd

from statsforecast import StatsForecast
from statsforecast.models import random_walk_with_drift

rng = np.random.RandomState(0)
serie1 = np.arange(1, 8)[np.arange(100) % 7] + rng.randint(-1, 2, size=100)
serie2 = np.arange(100) + rng.rand(100)
series = pd.DataFrame(
    {
        'ds': pd.date_range('2000-01-01', periods=serie1.size + serie2.size, freq='M'),
        'y': np.hstack([serie1, serie2]),
    },
    index=pd.Index([0] * serie1.size + [1] * serie2.size, name='unique_id')
)

fcst = StatsForecast(series, models=[random_walk_with_drift], freq='M')
forecasts = fcst.forecast(5)

I think the problem might be solved using pd.DatetimeIndex on self.last_dates. I'll open a PR soon.

n_jobs = -1 breaks

Describe the bug
In sklearn, we can pass n_jobs = -1. Here, however, it breaks

To Reproduce
Run

import numpy as np
import pandas as pd

from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers

horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]

series_train = pd.DataFrame(
    {
        'ds': pd.date_range(start='1949-01-01', periods=ap_train.size, freq='M'),
        'y': ap_train
    },
    index=pd.Index([0] * ap_train.size, name='unique_id')
)

fcst = StatsForecast(
    series_train,
    models=[(auto_arima, 12), (seasonal_naive, 12)],
    freq='M',
    n_jobs=-1
)
forecasts = fcst.forecast(12, level=(80, 95))

Expected behavior
n_jobs could take -1

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: mac
Browser: huh
Version:

>>> statsforecast.__version__
'0.5.3'

Additional context
Add any other context about the problem here.

Singular matrix error.

The following problem appears,

Since we already have the inverse of the Hessian, I think we could use,

sol = np.matmul(res.hess_inv, A) / n_used

Instead of,

hess = np.linalg.inv(res.hess_inv)
sol = np.linalg.solve(hess * n_used, A)

After that change, the problem disappears.

[question] Model summary table for ARIMA model

Hi! I was wondering if you have implemented (or planning to implement) a model summary table for the ARIMA model that contain the coefficients, their p-values, etc.?

Like https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.summary.html

Many thanks!

Wrong Badge license

Describe the bug

The badge license in the README shows GPLv3, but it should be MIT.

`ic` key error.

For some series a ic key error occurs for the fit dictionary.
I think the problem arises because of the following lines:

if not math.isnan(fit['aic']):
            fit['bic'] = fit['aic'] + npar * (math.log(nstar) - 2)
            fit['aicc'] = fit['aic'] + 2 * npar * (npar + 1) / (nstar - npar - 1)
            fit['ic'] = fit[ic]
else:
    fit['aic'] = fit['bic'] = fit['aicc'] = math.inf

I haven't checked the R code, but adding fit['ic'] in the else statement worked for me.

tqdm like expected time

I would like to have an estimated time for the completion of the jobs.
tqdm has a way to monitor the time taken by the code and an estimated arrival.
The parallelized version is extremely fast, still it would be handy to monitor its time.

Various bugs in documentation

Describe the bug
There a several bugs in the documentation :

Broken links
Links not visible
Background only partially applied

Below a few screenshots of the issue

"Broken link"

"Background only partially applied"

[FEAT] Add environment variables for `njit`'s `cache=True` and `nogil=True`

Is your feature request related to a problem? Please describe.
To speed up numba functions, cache=True can be used to avoid compilation times each time the function is invoked; and nogil=True can be used to release Python's GIL which can be useful since when doing multiprocessing numba compiles the function for every process.

Describe the solution you'd like
I've been thinking about a solution for a while and I think the best thing to do is to include environment variables for each argument, as suggested here.

Describe alternatives you've considered
Maybe default both arguments to True, but it is too restrictive.

error when arima_like's gain is 0

I got the following error:

I saw that it was related to the following lines within arima_like:

for j in range(d):
    gain += delta[j] * M[r + j]
if gain < 1e4:
    nu += 1
    ssq += resid * resid / gain
    sumlog += math.log(gain)
if use_resid:
     rsResid[l] = resid / math.sqrt(gain)
for i in range(rd):
    a[i] = anew[i] + M[i] * resid / gain
for i in range(rd):
    for j in range(rd):
        P[i + j * rd] = Pnew[i + j * rd] - M[i] * M[j] / gain

Using

gain = M[0]
if gain == 0.:
    gain += 1e-18

Solved the issue.

[BUG] `ds` object error

Describe the bug
When ds is a object, the method forecast arises an error but at the end of the pipeline, once the forecasts are computed.

Expected behavior
Check ds type at the beginning.

Uninstalling statsforecast does not completely uninstall module?

Describe the bug
Uninstalling statsforecast library does not completely uninstall the library. This causes issues in sktime checks and the following passes without raising an error after the library has been uninstalled.

 _check_soft_dependencies("statsforecast", severity="error", object=self)

To Reproduce

!pip install statsforecast

import statsforecast
statsforecast.__version__

!pip uninstall statsforecast

# This passes without any problem, but should have failed since the library has been uninstalled.
# This is essentially what sktime `_check_soft_dependencies` does
import statsforecast

# This fails, but it should have failed above itself
from statsforecast.arima import AutoARIMA as _AutoARIMA

Expected behavior
After uninstalling, the follownig should raise an exception (ModuleNotFoundError or ImportError)

Screenshots

Tags appear to be missing?

It makes it easier to understand exactly what code you're running if you can reference the tag in GitHub.

Maybe someone forgot to git push --follow-tags?

Add ARIMA class

Is your feature request related to a problem? Please describe.
The library already has the AutoARIMA class but it would be helpful to have the ARIMA class.

Describe the solution you'd like
An ARIMA class.

[docs] HTML in index notebook messes up online docs

This is how the docs render locally for me with the latest changes to the readme. I believe this is due to the embedded html, maybe we can try to achieve that formatting without using html.

[FEAT] Add `mstl` model for multiple seasonalities

Is your feature request related to a problem? Please describe.
Enhance the available models, including mstl (to handle multiple seasonalities).

Add confidence/prediction intervals

As requested here, it would be a great feature including confidence/prediction intervals.

add fit/predict methods for auto_arima

Why is first training slower, but subsequent trainings faster

I am curious by the first training for AutoARIMA using statsforecast is slower compared to subsequent trainings (even though the models are instantiated again before the subsequent trainings - i.e. new objects). Is some information being reused from the first training?

https://gist.github.com/ngupta23/59cc0ce155048f72b80a0431c57b7d17

Initial Training

Subsequent Training (independent object)

strange behavior pandas lower than 1.3.5

Statsforecast Arima residuals and predictions from training data

When running nixtla statsforecast arima model. i have been looking to see if it is possible to output the residuals from the training dataset?

Error when n_series * n_models < n_jobs

When the number of series is less than n_jobs, the following problem arises:

I think we could change the following lines,

statsforecast/statsforecast/core.py

Lines 128 to 129 in eed6cfa

 gas = self.ga.split(self.n_jobs) 

 with ProcessPoolExecutor(self.n_jobs) as executor:

To,

n_jobs = min(self.n_jobs, len(self.ga) * len(self.models))
gas = self.ga.split(n_jobs) 
with ProcessPoolExecutor(n_jobs) as executor:

Wrong usage of exogenous variables using parallel processing

n_jobs>1 uses the wrong exogenous variables. In the following example, the time series indexed by 0 has [0,...,143] as exogenous variable and the time series indexed by 1 has [144,...,287] as exogenous variable.

ap_df_2 = pd.DataFrame(
    {'ds': np.hstack([np.arange(ap.size), np.arange(ap.size)]), 
     'y': np.hstack([ap, ap])}, 
    index=pd.Index([0] * ap.size + [1] * ap.size, name='unique_id')
)
ap_df_2['x'] = np.arange(2 * ap.size)
ap_df_2 = ap_df_2.reset_index()
ap_df_2_test = ap_df_2.groupby('unique_id').tail(7)
ap_df_2_train = ap_df_2.drop(ap_df_2_test.index)

In the following image, I print x using n_jobs=1 and the data is correct.

But when I print x using n_jobs>1 the issue appears: the same exogenous data for both series,

I think the problem is related to the following lines,

statsforecast/statsforecast/core.py

Lines 60 to 62 in 6770173

 for i, grp in enumerate(self): 

 if xreg is not None: 

 xr = xreg[i*h : (i+1)*h]

The index i does not consider the gas partition.

	gas = self.ga.split(self.n_jobs)
	with ProcessPoolExecutor(self.n_jobs) as executor:

	for i, grp in enumerate(self):
	if xreg is not None:
	xr = xreg[ih : (i+1)h]

nixtla / statsforecast Goto Github PK

statsforecast's Introduction

Nixtla

Nixtla

Forecast using TimeGPT

🕰️ TimeGPT: Revolutionizing Time-Series Analysis

⚙️ Fine-Tuning: For Precision Prediction

🔄 Nixtla: Your Gateway to TimeGPT

💻 Installation

🎈 Quick Start

statsforecast's People

Contributors

Stargazers

Watchers

Forkers

statsforecast's Issues

Discussed in #71

Initial Training

Subsequent Training (independent object)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

🔄 `Nixtla`: Your Gateway to TimeGPT