GithubHelp home page GithubHelp logo

greykite's Introduction

Greykite: A flexible, intuitive and fast forecasting and anomaly detection library

Why Greykite?

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

Silverkite algorithm works well on most time series, and is especially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.

The Greykite library provides a framework that makes it easy to develop a good forecast model, with exploratory data analysis, outlier/anomaly preprocessing, feature extraction and engineering, grid search, evaluation, benchmarking, and plotting. Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework, as listed below.

Greykite AD (Anomaly Detection) is an extension of the Greykite Forecasting library. It provides users with an interpretable, fast, robust and easy to use interface to monitor their metrics with minimal effort.

Greykite AD improves upon the out-of-box confidence intervals generated by Silverkite, by automatically tuning the confidence intervals and other filters (e.g. based on APE) using expected alert rate information and/ or anomaly labels, if available. It allows the users to define robust objective function, constraints and parameter space to optimize the confidence intervals. For example user can target a minimal recall level of 80% while maximizing precision. Additionally, the users can specify a minimum error level to filter out anomalies that are not business relevant. The motivation to include criteria other than statistical significance is to bake in material/ business impact into the detection.

For a demo, please see our quickstart.

Distinguishing Features

  • Flexible design
    • Provides time series regressors to capture trend, seasonality, holidays, changepoints, and autoregression, and lets you add your own.
    • Fits the forecast using a machine learning model of your choice.
  • Intuitive interface
    • Provides powerful plotting tools to explore seasonality, interactions, changepoints, etc.
    • Provides model templates (default parameters) that work well based on data characteristics and forecast requirements (e.g. daily long-term forecast).
    • Produces interpretable output, with model summary to examine individual regressors, and component plots to visually inspect the combined effect of related regressors.
  • Fast training and scoring
    • Facilitates interactive prototyping, grid search, and benchmarking. Grid search is useful for model selection and semi-automatic forecasting of multiple metrics.
  • Extensible framework
    • Exposes multiple forecast algorithms in the same interface, making it easy to try algorithms from different libraries and compare results.
    • The same pipeline provides preprocessing, cross-validation, backtest, forecast, and evaluation with any algorithm.

Algorithms currently supported within Greykite’s modeling framework:

  • Silverkite (Greykite’s flagship forecasting algorithm)
  • Greykite Anomaly Detection (Greykite's flagship anomaly detection algorithm)
  • Facebook Prophet
  • Auto Arima

Notable Components

Greykite offers components that could be used within other forecasting libraries or even outside the forecasting context.

  • ModelSummary() - R-like summaries of scikit-learn and statsmodels regression models.
  • ChangepointDetector() - changepoint detection based on adaptive lasso, with visualization.
  • SimpleSilverkiteForecast() - Silverkite algorithm with forecast_simple and predict methods.
  • SilverkiteForecast() - low-level interface to Silverkite algorithm with forecast and predict methods.
  • ReconcileAdditiveForecasts() - adjust a set of forecasts to satisfy inter-forecast additivity constraints.
  • GreykiteDetector() - simple interface for optimizing anomaly detection performance based on Greykite forecasts.

Usage Examples

You can obtain forecasts with only a few lines of code:

For a demo, please see our quickstart.

Setup and Installation

Greykite is available on Pypi and can be installed with pip:

pip install greykite

For more installation tips, see installation.

Documentation

Please find our full documentation here.

Learn More

Citation

Please cite Greykite in your publications if it helps your research:

@misc{reza2021greykite-github,
  author = {Reza Hosseini and
            Albert Chen and
            Kaixu Yang and
            Sayan Patra and
            Yi Su and
            Rachit Arora},
  title  = {Greykite: a flexible, intuitive and fast forecasting library},
  url    = {https://github.com/linkedin/greykite},
  year   = {2021}
}
@inproceedings{reza2022greykite-kdd,
  author = {Hosseini, Reza and Chen, Albert and Yang, Kaixu and Patra, Sayan and Su, Yi and Al Orjany, Saad Eddin and Tang, Sishi and Ahammad, Parvez},
  title = {Greykite: Deploying Flexible Forecasting at Scale at LinkedIn},
  year = {2022},
  isbn = {9781450393850},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3534678.3539165},
  doi = {10.1145/3534678.3539165},
  booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages = {3007–3017},
  numpages = {11},
  keywords = {forecasting, scalability, interpretable machine learning, time series},
  location = {Washington DC, USA},
  series = {KDD '22}
}

License

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the BSD 2-Clause License.

greykite's People

Contributors

al-bert avatar dromare avatar kaixuyang avatar kathygcy avatar martinmenchon avatar njusu avatar reza1317 avatar sayanpatra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

greykite's Issues

TerminatedWorkerError while running benchmarking

Hi,

When I try to run benchmarking for Silverkite and Prophet( both with my data and the example data in the notebook provided with Greykite), I get the following error:
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

When I set n_jobs=1, the kernel crashes, whereas for any other value of n_jobs , the above error shows up.
OS: windows 10
RAM : 16GB
Processor: i7

Any suggestions/pointers?

Regards

Greykite and inconsistent timeseries?

If my data is not sequential (has plenty of gaps), can Greykite handle it? I suspect that is the problem I am having with my data but I am not totally sure.

EDIT: I see in Examine Input Data that greykite has an internal imputation method so let me be more clear. There is no specific time-frequency. I am not sampling data at a regular time interval. Data comes when it comes and there is no clear demarcation of the time interval. Would greykite be able to handle seemly randomly spaced and inconsistent input dates?

`get_default_origin_for_time_vars` throws a `KeyError` if the leading time series values are null

get_default_origin_for_time_vars throws a KeyError if the leading time series value is np.nan.

date = pd.to_datetime(df[time_col][0])

For example:

from greykite.sklearn.estimator.simple_silverkite_estimator import SimpleSilverkiteEstimator

params =  {'forecast_horizon': 1, 
 'uncertainty_dict': {'uncertainty_method': 'simple_conditional_residuals', 
                      'params': {'conditional_cols': ['dow'], 
                                 'quantile_estimation_method': 'normal_fit', 
                                 'sample_size_thresh': 5, 
                                 'small_sample_size_method': 'std_quantiles', 
                                 'small_sample_size_quantile': 0.99}}, 
 'fit_algorithm_dict': {'fit_algorithm': 'elastic_net', 
                        'fit_algorithm_params': {
                            'l1_ratio': [0.1,0.5,0.7,0.9,0.95,0.99,1]
                        }
                        }, 
 'holiday_lookup_countries': ['US'], 
 'changepoints_dict': {'method': 'auto', 
                       'regularization_strength': None, 
                       'actual_changepoint_min_distance': '7D', 
                       'potential_changepoint_distance': '3D', 
                       'yearly_seasonality_order': 6, 
                       'no_changepoint_proportion_from_end': 0.18}, 
 'yearly_seasonality': 'auto', 
 'quarterly_seasonality': False, 
 'monthly_seasonality': False, 
 'weekly_seasonality': 'auto', 
 'daily_seasonality': 'auto', 
 'coverage': 0.95
 }
df = pd.DataFrame({'ds':pd.date_range('2021-08-18', periods=20), 'y': range(1,21)})
m = SimpleSilverkiteEstimator(**params)
m.fit(df, time_col="ds", value_col="y")

will fit and predict as expected, however,

df.loc[0, 'y'] = np.nan
m.fit(df, time_col="ds", value_col="y")

will throw a *** KeyError: 0

This happens because of the [0] indexing. When null values are dropped so is the index 0 which results in the KeyError. This issue can be addressed by using iloc[0]` instead to access the first row in the dataframe.

Let me know if this is intentional - happy to open a PR otherwise!

Vicky

Trying to put greykite to Docker

Hi team, can you suggest some basic docker image to put Kite into

I am trying to install everything to: continuumio/miniconda3, but
when I am running: conda install -c msys2 libpython m2w64-toolchain- it fails

Load Model from a GCP Cloud Function

I'm trying to deploy my greykite model on GCP via a cloud function. The existing read and write functions only work for local directories and not cloud blob storage options. I've adjusted the write function to write to cloud storage but the load function is proving to be a bit challenging.

Greykite suitable for pure linear increasing series?

Hello

I'm working in some house price time series using Greykite but for some reason, the forecast I got is just a median price between upper and lower (ARIMA), so is this known issue with Greykite when we have a pure linear increasing series?

Thank you
Aktham Momani
greykite_forecast

Components breakdown for forecast

Hi,

I'm looking to break down my forecast into its components. I saw your function plot_components however, it only uses the train data to create the components.

In my example, my training data never has values on Saturdays. In that case,

params = {'forecast_horizon': 1,
 'fit_algorithm_dict': {'fit_algorithm': 'elastic_net',
  'fit_algorithm_params': {'l1_ratio': [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]}},
 'changepoints_dict': {'method': 'auto',
  'regularization_strength': None,
  'actual_changepoint_min_distance': '7D',
  'potential_changepoint_distance': '3D',
  'yearly_seasonality_order': 6},
}
model = SimpleSilverkiteEstimator(**params)
model.fit(data)
predictions = model.predict(data)
model.plot_components()

return a weekly component plot that just connects Friday and Sunday with a straight line. (Note that hovering on 5, Saturday, does not show any data).
image
image

When predicting, I want to include Saturdays in my forecast and predict their value based on the fitted coefficients for 'sin1_tow_weekly', 'cos1_tow_weekly', 'sin2_tow_weekly', 'cos2_tow_weekly', 'sin3_tow_weekly', 'cos3_tow_weekly'.

Is there an easy way to retrieve the components for predictions? Alternatively, how would I access the design matrix x_mat for the forecast instead of the fitted data?

Thank you in advance!
Vicky

Getting Various Warnings while running time series prediction

  • I'm trying to fit GreyKite Model to my time series data.

  • I have attached the csv file for reference.

  • Even though the model works, it raises a bunch of warnings that I'd like to avoid.

  • Since some of my target values are zero it tells me that MAPE is undefined.

  • Also, since I'm only forecasting one step into the future, it gives me an UndefinedMetricWarning : R^2 score is not well-defined with less than two samples.'

  • I have attached a few images displaying the warnings.

  • Any help to get rid of these warnings would be appreciated!

  • This is the code I'm using to fit the data:

`class GreyKiteModel(AnomalyModel):

def __init__(self, *args,model_kwargs = {}, **kwargs) -> None:
    super().__init__(*args, **kwargs)
    self.model_kwargs = model_kwargs
    
def predict(self, df: pd.DataFrame, ) -> pd.DataFrame:
    """Takes in pd.DataFrame with 2 columns, dt and y, and returns a 
    pd.DataFrame with 3 columns, dt, y, and yhat_lower, yhat_upper.

    :param df: Input Dataframe with dt, y columns
    :type df: pd.DataFrame
    :return: Output Dataframe with dt, y, yhat_lower, yhat_upper 
    columns
    :rtype: pd.DataFrame
    """
    df = df.rename(columns = {"dt":"ds", "y":"y"})
    metadata = MetadataParam(time_col="ds", # ----> name of the time column 
                             value_col="y", # ----> name of the value column 
                             freq="D"       # ----> H" for hourly, "D" for daily, "W" for weekly, etc. 
                            )
    forecaster = Forecaster()  # Creates forecasts and stores the result
    result = forecaster.run_forecast_config(df=df, # result is also stored as forecaster.forecast_result
                                            config=ForecastConfig(model_template=ModelTemplateEnum.SILVERKITE.name,
                                                                  forecast_horizon=1,  # forecasts 1 step
                                                                  coverage=0.95,
                                                                  metadata_param=metadata 
                                                                  )
                                            )
    forecast_df = result.forecast.df
    forecast_df = forecast_df.drop(columns=['actual'])
    forecast_df.rename(columns={'ds':'dt',
                                'forecast':'y', 
                                'forecast_lower':'yhat_lower', 
                                'forecast_upper':'yhat_upper' },
                       inplace=True)
    return forecast_df`

df.csv

Screenshot from 2021-08-21 12-39-55

Screenshot from 2021-08-21 12-39-10

Grid search range of values

Hello,

Is there a range of values that you would recommend to search over for each of these hyperparameters?
Thanks!

seasonality = {
     "yearly_seasonality": [10, 20],
     "quarterly_seasonality": False,
     "monthly_seasonality": False,
     "weekly_seasonality": False,
     "daily_seasonality": False
 }

 changepoints = {
     "changepoints_dict": None
 }

 # Specifies custom parameters
 custom = {
     "fit_algorithm_dict": {"fit_algorithm": "linear"}
 }

 # Hyperparameter override can be a list of dictionaries.
 # Each dictionary will be one set of hyperparameters.
 override = [
     {},
     {
         "estimator__changepoints_dict": {"method": "auto"},
         "estimator__fit_algorithm_dict": {"fit_algorithm": "ridge"}
     }
 ]

Inclusion of custom events, into the model itself with anomaly/holiday score for interpolation.

Hey, I am working on a use case that requires a lot of event based forecasting.

Event may range from a technical updation to that of the release of new campaign, but the dates are generally known to business beforehand. Now these events does not follow any calendar rules but data has seen such type of events in the past, so putting those holidays/events in the model itself with custom future date/strength (as all campaigns or event of same genre will have the same type of impact) make the prediction really smooth as no extra added steps are to be done after prediction to tune out the holidays, all will be taken care of in the predict function.

I had already raised an issue in the prophet community here which is a detailed discussion along with required code snippets. Please let me know if you are thinking to incorporate such changes. This will be really handy for event driven forecasting.

Can't save model

After fitting model I would like to persist it for later use in my app. I tried to save the model (result.model), the forecaster, the forecaster and forecaster.forecast_result and none of them could be persisted using pickle or joblib.

That's the error I get. Any advice?

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-77-0716155adc48> in <module>
----> 1 joblib.dump(result.model, model_path)

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in dump(value, filename, compress, protocol, cache_size)
    478     elif is_filename:
    479         with open(filename, 'wb') as f:
--> 480             NumpyPickler(f, protocol=protocol).dump(value)
    481     else:
    482         NumpyPickler(filename, protocol=protocol).dump(value)

/work/y435/crypto-forecast/lib/python3.7/pickle.py in dump(self, obj)
    435         if self.proto >= 4:
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)
    439         self.framer.end_framing()

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    547 
    548         # Save the reduce() output and finally memoize the object
--> 549         self.save_reduce(obj=obj, *rv)
    550 
    551     def persistent_id(self, obj):

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    660 
    661         if state is not None:
--> 662             save(state)
    663             write(BUILD)
    664 

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
    857 
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 
    861     dispatch[dict] = save_dict

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
    883                 for k, v in tmp:
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)
    887             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_list(self, obj)
    817 
    818         self.memoize(obj)
--> 819         self._batch_appends(obj)
    820 
    821     dispatch[list] = save_list

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_appends(self, items)
    841                 write(MARK)
    842                 for x in tmp:
--> 843                     save(x)
    844                 write(APPENDS)
    845             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_tuple(self, obj)
    772         if n <= 3 and self.proto >= 2:
    773             for element in obj:
--> 774                 save(element)
    775             # Subtle.  Same as in the big comment below.
    776             if id(obj) in memo:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    547 
    548         # Save the reduce() output and finally memoize the object
--> 549         self.save_reduce(obj=obj, *rv)
    550 
    551     def persistent_id(self, obj):

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    660 
    661         if state is not None:
--> 662             save(state)
    663             write(BUILD)
    664 

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
    857 
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 
    861     dispatch[dict] = save_dict

/work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
    883                 for k, v in tmp:
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)
    887             elif n:

/work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
    280             return
    281 
--> 282         return Pickler.save(self, obj)
    283 
    284 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/work/y435/crypto-forecast/lib/python3.7/pickle.py in save_global(self, obj, name)
    958             raise PicklingError(
    959                 "Can't pickle %r: it's not found as %s.%s" %
--> 960                 (obj, module_name, name)) from None
    961         else:
    962             if obj2 is not obj:

PicklingError: Can't pickle <function add_finite_filter_to_scorer.<locals>.score_func_finite at 0x7f490e750d40>: it's not found as greykite.common.evaluation.add_finite_filter_to_scorer.<locals>.score_func_finite

pickling greykite model

Hi again,

Thanks for the package!

I was having trouble pickling the results.model object, I guess there isn't any particular way to do that yet?

I tried using the dump_obj utility I found but got the following error:
image

Best,
George

Getting this error "ValueError: zero-size array to reduction operation minimum which has no identity" ussing lasso

Hi everyone I'm having this error sometimes when it executes the function summary = result.model[-1].summary() with fit_algorithm: lasso and cv_max_splits: 2

Error:
ValueError: zero-size array to reduction operation minimum which has no identity during lasso

For the same data, sometimes it works well and sometimes it fails. I think it's doing a min.reduce of an empty array.
Does anyone know how to solve it?

Regards!

Edit:
The code inside model_summary_utils
confidence_intervals = []
for i in range(beta_estimates.shape[0]):
if (beta_estimates[i] == 0).all():
confidence_intervals.append([0., 0.])
else:
check_index = (p_value_rankings[i] > gamma_min) & (beta_estimates[i] != 0)
if len(check_index) == 0:
confidence_intervals.append([0., 0.])
else:
lb = np.min(ordinary_confidence_interval_lb[i, check_index])
ub = np.max(ordinary_confidence_interval_ub[i, check_index])
confidence_intervals.append([lb, ub])

The problem is that ordinary_confidence_interval_lb[i, check_index] is empty

Cannot set l1_ratio as a list when using Elastic Net

Hello there,

I get an error when running Greykite with the Elastic Net algorithm and the l1_ratio parameter set up as a list of floats [.1, .5, .7, .9, .95, .99, 1] rather than as a single float number:

Capture2

Capture3

The Scikit Learn link https://scikit-learn.org/0.24/modules/generated/sklearn.linear_model.ElasticNetCV.html#sklearn.linear_model.ElasticNetCV says the following:
...This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1].

I would like to know if there is a workaround other than setting up a grid search and CV validation outside the ElasticNetCV() framework, for example:

_cv_max_splits = 5

Grid search is possible

custom = dict(
fit_algorithm_dict=[
dict(
fit_algorithm="elastic_net",
fit_algorithm_params={
"l1_ratio": 0.7
}
),
dict(
fit_algorithm="elastic_net",
fit_algorithm_params={
"l1_ratio": 0.9
}
),
]
)_

Thank you for the good work !

Best regards,
Dario

Clarification on cv_selection_metric

Hi all,

Thanks for the great work. Just had a quick question about the cv_selection_metric.
When you choose it does it actually use the selected metric as a loss function or
is MSE still used and metrics later computed on the CV splits?

Best,
George

Figure not showing in plotly.io.show()

I am trying to use the simple_forecast notebook from the quickstart but when running plotly.io.show(fig) no image is coming here. There is no error being thrown.
Screenshot from 2021-05-25 11-56-06

Spark Support

Just a query. Is there a plan to support spark dataframes in the future ?

Setting of "cv_max_splits" when using "fit_algorithm": "lasso"

Hi all,

When setting fit_algorithm_params={"cv": 5} to use 5-fold CV with sklearn LassoCV() on the training set, how should the global parameter "cv_max_splits" be set up ? (either set it to zero, or to None - equivalent to 3 - or equal to 5 ?).

Best regards,
Dario

Need help benchmarking?

Hi devs,

I'm pleased to see another open-source package that looks to be very well documented. Congrats on the hard work.

I noticed you asked for suggestions of time series on your README, and benchmarking. I sporadically add things to some time series benchmarking at
https://microprediction.github.io/timeseries-elo-ratings/html_leaderboards/univariate-k_003.html
and hopefully, I'll get around to that (this amounts to calling greykite using the "skater" syntax explained in the timemachines package README).

On the other hand, if you just want some time series to use to benchmark yourself, you can just grab whatever you want from https://www.microprediction.org/browse_streams.html and see https://www.microprediction.org/features.html for the API (or you can use the microprediction client too)

Anyway, let me know if I can help.

Peter

TimeSeries features

Hi all,

Great library and work! I was curious if there is a recommended way to get the time series features as a dataframe without running the model? I am looking to compare with other models.

Thanks,
George

error : "NaTType does not support strftime"

i am trying using greykite on kaggle monthly-beer-production-in-austr data set :
df = pd.read_csv('data/monthly-beer-production-in-austr.csv')
df.Month = pd.to_datetime(df['Month'])
df.head()

  Month Monthly beer production
1956-01-01 93.2
1956-02-01 96.0
1956-03-01 95.2
1956-04-01 77.1
1956-05-01 70.9
then when follow simple forecast orders:

metadata = MetadataParam(
time_col="Month",
value_col="Monthly beer production",
freq="M"
)

forecaster = Forecaster()
result = forecaster.run_forecast_config(
df=df,
config=ForecastConfig(
model_template=ModelTemplateEnum.SILVERKITE.name,
forecast_horizon=test_period,
metadata_param=metadata
)
)

i got the following error:

Traceback (most recent call last):
File "/home/pycharm/pycharm-community-2020.1.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1438, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/pycharm/pycharm-community-2020.1.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/projects/time_sseries_compering/temp.py", line 38, in
metadata_param=metadata
File "/home/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/framework/templates/forecaster.py", line 288, in run_forecast_config
config=config)
File "/home/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/framework/templates/forecaster.py", line 257, in apply_forecast_config
self.pipeline_params = self.template.apply_template_for_pipeline_params(df=df, config=self.config)
File "/home/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/framework/templates/base_template.py", line 250, in process_wrapper
pipeline_params = func(self, df, config)
File "/home/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/framework/templates/base_template.py", line 287, in apply_template_for_pipeline_params
self.time_properties = self.get_forecast_time_properties()
File "/home/or/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/framework/templates/base_template.py", line 208, in get_forecast_time_properties
forecast_horizon=self.config.forecast_horizon)
File "/home/or/anaconda3/envs/icl/lib/python3.7/site-packages/greykite/common/time_properties_forecast.py", line 159, in get_forecast_time_properties
start_year = int(train_start.strftime("%Y"))
File "pandas/_libs/tslibs/nattype.pyx", line 69, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support strftime

changing freq for something else (like (W) work fine
what am i doing wrong
thank you

Predictions taking too long

Hi Greykite Team!

I am trying to use Greykite to predict at scale and I am not sure if I am doing something wrong but even with the example code, the predictions take a long time to calculate. Sometime in the 20, 30, 40 seconds and others in the minutes. Any help will be greatly appreciated. Below is a sample code I am running that takes about 17 or so seconds.

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum
import numpy as np
import pandas as pd
np.random.seed(1)

rows,cols = 365,1
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='MS')
data_frame = pd.DataFrame(data, columns=['y'], index=tidx)
data_frame = data_frame.reset_index()
data_frame.columns = ['ts', 'y']

metadata = MetadataParam(
time_col="ts", # time column in df
value_col="y" # value in df
)
forecaster = Forecaster() # creates forecasts and stores the result
forecaster.run_forecast_config(
df=data_frame,
config=ForecastConfig(
# uses the SILVERKITE model template parameters
model_template=ModelTemplateEnum.SILVERKITE.name,
forecast_horizon=365, # forecasts 365 steps ahead
coverage=0.95, # 95% prediction intervals
metadata_param=metadata
)
)

forecaster.forecast_result

Prophet Logistic Growth

Thanks for this library. This really makes my workflow a lot easier !!

I am trying to fit a 'logistic' growth model using the Prophet and I'm passing the dataframe containing 'cap' and 'floor' columns to the metadata param. But while fitting the model, Prophet throws an error saying that for a logistic model, it expectcs 'cap' and 'floor' columns. How do i specify which columns in the dataframe should be used for 'cap' and 'floor' ?

When I looked at the code for ProphetEstimator fit function, only the 'time' column and 'y' column gets passed on to Prophet code, no additional columns are passed on.

def fit(self, X, y=None, time_col=TIME_COL, value_col=VALUE_COL, **fit_params):
       
        super().fit(X, y=y, time_col=time_col, value_col=value_col, **fit_params)

        if self.add_regressor_dict is None:
            fit_columns = [time_col, value_col]
        else:
            reg_cols = list(self.add_regressor_dict.keys())
            fit_columns = [time_col, value_col] + reg_cols

Is this a bug or is there a way to pass the 'cap' and 'floor' columns that I'm missing ?
I couldn't find an example on how to do this in the documentation.

Thanks !!

"cv_selection_metric" & "cv_report_metrics"

Hello all,

I am running the Greykite without cross-validation (cv_max_splits = 0) because I am using the LassoCV() algorithm which by itself uses 5-fold CV. The ForecastConfig() is as follows, in particular, evaluation_metric is all set to None because cv_max_splits = 0:

Capture

However, the output on the console suggests that at least 3 metrics are evaluated. My response contains zeros so I do not want MAPE and MedAPE to be reported, and I do not want "Correlation" to be reported either. As a matter of fact, since the loss function in LassoCV() is MSE (L2-norm), I am not interested in anything rather than MSE, really. Unless the loss function in LassoCV() could be changed to MAE (L1-norm) in that case I would be interested in the MAE instead of MSE:

Capture1

Do you have any suggestions please ?

Best regards,
Dario

Why pin runtime dependencies so tightly?

Hi,

Looking at the setup.py file, it looks like the following are all required runtime dependencies, all of which need to be pinned very precisely:

requirements = [    "Cython==0.29.23",    "cvxpy==1.1.12",    "fbprophet==0.5",    "holidays==0.9.10",  # 0.10.2,    "ipykernel==4.8.2",    "ipython==7.1.1",    "ipywidgets==7.2.1",    "jupyter==1.0.0",    "jupyter-client==6.1.5",    "jupyter-console==6.",  # used version 6 to avoid conflict with ipython version    "jupyter-core==4.7.1",    "matplotlib==3.4.1",    "nbformat==5.1.3",    "notebook==5.4.1",    "numpy==1.20.2",    "osqp==0.6.1",    "overrides==2.8.0",    "pandas==1.1.3",    "patsy==0.5.1",    "Pillow==8.0.1",    "plotly==3.10.0",    "pystan==2.18.0.0",    "pyzmq==22.0.3",    "scipy==1.5.4",    "seaborn==0.9.0",    "six==1.15.0",    "scikit-learn==0.24.1",    "Sphinx==3.2.1",    "sphinx-gallery==0.6.1",    "sphinx-rtd-theme==0.4.2",    "statsmodels==0.12.2",    "testfixtures==6.14.2",    "tornado==5.1.1",    "tqdm==4.52.0"]

My question is - why pin them so tightly, and are all of them really necessary? E.g. do I really need sphinx-gallery? Such tight pins make it very difficult to integrate into any existing project. Why not just require a lower bound for many/most of these?

Forecast with multiple/grouped/hierarchical time series

Say one had a dataset of demand of several products (1200 products) for over 5 years (weekly) and needed to forecast the demand of each product. Is there a way to train product specific models and forecast at the product level?

Additionally, is here a way to do hierarchical forecasting? For example if we have a hierarchical listing of different products can we then do forecasting both at the base level (i.e. for each individual products time series) and at aggregate levels defined by the product hierarchy?

This would require us to reconcile the forecasts at the different levels (using Top Down, Bottom Up, Optimal Reconciliation, etc...). I see that there is a reconcile_forecasts.py and a hierarchical_relationship.py but no documentation on these implementations.

Thanks in advance.

install issue on Py 3.9 ubuntu

First, the workaround:

pip install --upgrade numpy
pip install osqp==0.6.1
pip install greykite 

However, when installing on 3.9

 pip install greykite 

I get a dep reconcilement error ... though not for python 3.8 or 3.7

Then, when trying:

  pip install osqp==0.6.1
  pip install greykite 

on this run we fail with the following trace so maybe this is really an issue for osqp.

Collecting osqp==0.6.1
  Downloading osqp-0.6.1.tar.gz (211 kB)
    ERROR: Command errored out with exit status 1:
     command: /opt/hostedtoolcache/Python/3.9.7/x64/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zyakqj8s/osqp_44d9cc4bcb2443fd871eaf19a373922d/setup.py'"'"'; __file__='"'"'/tmp/pip-install-zyakqj8s/osqp_44d9cc4bcb2443fd871eaf19a373922d/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-k6piad63
         cwd: /tmp/pip-install-zyakqj8s/osqp_44d9cc4bcb2443fd871eaf19a373922d/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-zyakqj8s/osqp_44d9cc4bcb2443fd871eaf19a373922d/setup.py", line 11, in <module>
        import numpy
    ModuleNotFoundError: No module named 'numpy'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/ba/17/49790ce2ce7a6b95cd250642ebc68bd723ddefdd052ee8dcc1e0dcf4ffca/osqp-0.6.1.tar.gz#sha256=47b17996526d6ecdf35cfaead6e3e05d34bc2ad48bcb743153cefe555ecc0e8c (from https://pypi.org/simple/osqp/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement osqp==0.6.1 (from versions: 0.1.3, 0.2.0, 0.2.1, 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.6.1, 0.6.2, 0.6.2.post0)

Future regressor data

Hi,

I can't find in the documentation how to pass future regressor data to the forecaster. I can specify the column names for the regressors in the ModelComponentsParam, but if you run the run_forecast_config I get an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-267-d126d5cc7a8a> in <module>
      6         coverage=0.95,         # 95% prediction intervals
      7         metadata_param=metadata,
----> 8         model_components_param=model_components
      9     )
     10 )

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\templates\forecaster.py in run_forecast_config(self, df, config)
    287             df=df,
    288             config=config)
--> 289         self.forecast_result = forecast_pipeline(**pipeline_parameters)
    290         return self.forecast_result
    291 

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in pipeline_wrapper(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    240             cv_periods_between_splits=cv_periods_between_splits,
    241             cv_periods_between_train_test=cv_periods_between_train_test,
--> 242             cv_max_splits=cv_max_splits
    243         )
    244     return pipeline_wrapper

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in forecast_pipeline(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    740         xlabel=time_col,
    741         ylabel=value_col,
--> 742         relative_error_tolerance=relative_error_tolerance)
    743 
    744     result = ForecastResult(

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\utils.py in get_forecast(df, trained_model, train_end_date, test_start_date, forecast_horizon, xlabel, ylabel, relative_error_tolerance)
    758         Forecasts represented as a ``UnivariateForecast`` object.
    759     """
--> 760     predicted_df = trained_model.predict(df)
    761     # This is more robust than using trained_model.named_steps["estimator"] e.g.
    762     # if the user calls forecast_pipeline with a custom pipeline, where the last

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
    417         for _, name, transform in self._iter(with_final=False):
    418             Xt = transform.transform(Xt)
--> 419         return self.steps[-1][-1].predict(Xt, **predict_params)
    420 
    421     @if_delegate_has_method(delegate='_final_estimator')

~\Anaconda3\envs\gkite\lib\site-packages\greykite\sklearn\estimator\base_silverkite_estimator.py in predict(self, X, y)
    346             trained_model=self.model_dict,
    347             past_df=None,
--> 348             new_external_regressor_df=None)["fut_df"]  # regressors are included in X
    349 
    350         self.forecast = pred_df

~\Anaconda3\envs\gkite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py in predict(self, fut_df, trained_model, freq, past_df, new_external_regressor_df, sim_num, include_err, force_no_sim, na_fill_func)
   1464 
   1465         if fut_df.shape[0] <= 0:
-> 1466             raise ValueError("``fut_df`` must be a dataframe of non-zero size.")
   1467 
   1468         if time_col not in fut_df.columns:

ValueError: ``fut_df`` must be a dataframe of non-zero size.

Training the model on all data

Hello,

First of all, thanks for this library!

I want to train the model on all of my data, then create a future dataframe and let the model forecast those timesteps. This is to simulate a real-world situation where you actually want to predict the future, in which you don't have any data to validate on.

The last timestamp in my dataset is 2020-02-20 09:00:00. So I set the train_end_date to this timestamp in MetadataParam like this:

metadata = MetadataParam(
time_col="ts",
value_col="y",
freq="H",
train_end_date=datetime.datetime(2020, 2, 20, 9)
)

Then, in forecaster.forecast_config, I tried commenting out forecast horizon, which needs to be >= 1.

forecaster = Forecaster() # Creates forecasts and stores the result
result = forecaster.run_forecast_config(
df=df_model,
config=ForecastConfig(
model_template=ModelTemplateEnum.SILVERKITE.name, # model template
#forecast_horizon=1,
coverage=0.95, # 95% prediction intervals
metadata_param=metadata,
model_components_param=model_components,
evaluation_period_param=evaluation_period
)
)

Running this I get the Error message:
ValueError: fut_df must be a dataframe of non-zero size.

So the closest I have come to achieve what I want is to set train_end_date=datetime.datetime(2020, 2, 20, 8), an hour before the last timestamp in the dataset, and use forecast_horizon=1. However, I still want the model to train on this last hour, since I intend to run a short-term forecast.

So, the question I have is; how do I train the model on all of my data, without forecasting on it before I give the model a future dataframe?

Bug in Elastic Net model summary

Hello,

This is to report a bug and propose a Pull Request to fix it in the next version.
The background to the problem can be found in Issue 52, at that time it was thought that the problem originated during the call result = forecaster.run_forecast_config, and the problem could not be reproduced by the Greykite developers.

A second look at the problem shows that it originates afterwards, when trying to extract the model information with the command result.model[-1].summary(). That is when the runtime error reported in Issue 52 appears, thrown in line 384 of model_summary_utils.py:
image
TypeError: unsupported operand type(s) for -: 'int' and 'list'

Taking as an example the following configuration:

model_components = ModelComponentsParam(
    custom = dict(
        fit_algorithm_dict=dict(
            fit_algorithm="elastic_net",
            fit_algorithm_params = dict(
                alphas=np.logspace(-5, +3, num=9, endpoint=True),,
                l1_ratio=[.1, .5, .7, .9, .95, .99, 1]
            )
        )
    )
)

this results in the following ml_model attributes after the call ml_model = fit_model_via_design_matrix(...) in line 372 of ml_models.py:
image
i.e. ml_model.l1_ratio instead of ml_model.l1_ratio_ is assigned in line 291 of model_summary_utils.py:
image

Assigning ml_model.l1_ratio_ in line 291 solves the problem and produces the model summary without errors and with the proper results:
image

Silverkite for multivariate

how to apply Silverkite for multivariate time series analysis. how to forecasts using common regressors for multivariate time series forecasting?

ERROR: Failed building wheel for scs

When trying to install : pip install greykite:

" ERROR: Failed building wheel for scs
Failed to build scs
ERROR: Could not build wheels for scs which use PEP 517 and cannot be installed directly"

Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

Hi,

First of all thank you for open-sourcing this library. It's really complete and well though (as well as the Silverkite algorithm itself).

However, I think I have spotted a potential bug:

It seems that the option seasonality_changepoints_dict in ModelComponentsParam does seem to break some functionality within pandas, when running Silverkite with cross-validation.

Here's a complete example (using Greykite 0.2.0):

import pandas as pd
import numpy as np

# Load airline passengers dataset (with monthly data):
air_passengers = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv")
air_passengers["Month"] = pd.to_datetime(air_passengers["Month"])
air_passengers = air_passengers.set_index("Month").asfreq("MS").reset_index()

# Prepare Greykite configs:
from greykite.framework.templates.autogen.forecast_config import (ComputationParam, 
                                                                  EvaluationMetricParam, 
                                                                  EvaluationPeriodParam,
                                                                  ForecastConfig, 
                                                                  MetadataParam, 
                                                                  ModelComponentsParam)

# Metadata:
metadata_params = MetadataParam(date_format=None,  # infer
                                freq="MS",
                                time_col="Month",
                                train_end_date=None,
                                value_col="Passengers")

# Eval metric:
evaluation_metric_params = EvaluationMetricParam(agg_func=np.sum,   # Sum all forecasts...
                                                 agg_periods=12,    # ...Over 12 months
                                                 cv_report_metrics=["MeanSquaredError", "MeanAbsoluteError", "MeanAbsolutePercentError"],
                                                 cv_selection_metric="MeanAbsolutePercentError",
                                                 null_model_params=None,
                                                 relative_error_tolerance=None)

# Eval procedure (CV & backtest):
evaluation_period_params = EvaluationPeriodParam(cv_expanding_window=False,
                                                 cv_horizon=0,   # No CV for now. CHANGE THIS
                                                 cv_max_splits=5,
                                                 cv_min_train_periods=24,
                                                 cv_periods_between_splits=6,
                                                 cv_periods_between_train_test=0,
                                                 cv_use_most_recent_splits=False,
                                                 periods_between_train_test=0,
                                                 test_horizon=12)

# Config for seasonality changepoints
seasonality_components_df = pd.DataFrame({"name": ["conti_year"],
                                          "period": [1.0],
                                          "order": [5],
                                          "seas_names": ["yearly"]})

# Model components (quite long):
model_components_params = ModelComponentsParam(autoregression={"autoreg_dict": "auto"},
                                               
                                               changepoints={"changepoints_dict":  [{"method":"auto",
                                                                                     "potential_changepoint_n": 50,
                                                                                     "no_changepoint_proportion_from_end": 0.2,
                                                                                     "regularization_strength": 0.01}],
                                                             
                                                             # Seasonality changepoints
                                                             "seasonality_changepoints_dict": [{"regularization_strength": 0.6,
                                                                                                "no_changepoint_proportion_from_end": 0.8,
                                                                                                "seasonality_components_df": seasonality_components_df,
                                                                                                "potential_changepoint_n": 50,
                                                                                                "resample_freq":"MS"},
                                                                                               ]
                                                            },
                                               
                                               custom={"fit_algorithm_dict": [{"fit_algorithm": "linear"},
                                                                              ],
                                                       "feature_sets_enabled": "auto",
                                                       "min_admissible_value": 0.0},
                                               
                                               events={"holiday_lookup_countries": None,
                                                       "holidays_to_model_separately": None,
                                                       },
                                               
                                               growth={"growth_term":["linear"]},
                                               
                                               hyperparameter_override={"input__response__outlier__z_cutoff": [100.0],
                                                                        "input__response__null__impute_algorithm": ["ts_interpolate"]},
                                               
                                               regressors=None,
                                               
                                               lagged_regressors=None,
                                               
                                               seasonality={"yearly_seasonality": [5],
                                                            "quarterly_seasonality": ["auto"],
                                                            "monthly_seasonality": False,
                                                            "weekly_seasonality": False,
                                                            "daily_seasonality": False},
                                               
                                               uncertainty=None)

# Computation
computation_params = ComputationParam(n_jobs=1,
                                      verbose=3)


# Define forecaster:
from greykite.framework.templates.forecaster import Forecaster

# defines forecast configuration
config=ForecastConfig(model_template="SILVERKITE",
                      forecast_horizon=12,
                      coverage=0.8,
                      metadata_param=metadata_params,
                      evaluation_metric_param=evaluation_metric_params,
                      evaluation_period_param=evaluation_period_params,
                      model_components_param=model_components_params,
                      computation_param=computation_params,
                     )

# Run:
# creates forecast
forecaster = Forecaster()
result = forecaster.run_forecast_config(df=air_passengers, 
                                        config=config 
                                        )

If we run the piece of code above, everything works as expected. However, if we activate cross-validation (increasing cv_horizon to 5 for instance), Greykite crashes. This happens unless we remove seasonality changepoints (through removing seasonality_changepoints_dict).

The crash traceback looks as follows:

5 fits failed out of a total of 5.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_validation.py", line 681, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\sklearn\estimator\simple_silverkite_estimator.py", line 239, in fit
    self.model_dict = self.silverkite.forecast_simple(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py", line 708, in forecast_simple
    trained_model = super().forecast(**parameters)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py", line 719, in forecast
    seasonality_changepoint_result = get_seasonality_changepoints(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 1177, in get_seasonality_changepoints
    result = cd.find_seasonality_changepoints(**seasonality_changepoint_detection_args)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\common\python_utils.py", line 787, in fn_ignore
    return fn(*args, **kwargs)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 736, in find_seasonality_changepoints
    seasonality_df = build_seasonality_feature_df_with_changes(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoints_utils.py", line 237, in build_seasonality_feature_df_with_changes
    fs_truncated_df.loc[(features_df["datetime"] < date).values, cols] = 0
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 719, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 646, in _get_setitem_indexer
    self._ensure_listlike_indexer(key)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 709, in _ensure_listlike_indexer
    self.obj._mgr = self.obj._mgr.reindex_axis(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\base.py", line 89, in reindex_axis
    return self.reindex_indexer(
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\managers.py", line 670, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexes\base.py", line 3785, in _validate_can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis


C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:

One or more of the test scores are non-finite: [nan]

C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:

One or more of the train scores are non-finite: [nan]

It would be great to cross-validate when seasonality changepoint is activated, as it allows to learn multiplicative seasonalities for instance in a similar fashion as Prophet or Orbit do.

Thank you!

What is SilverKite and reconcile?(Will like to help in Example Notebooks)

Hi, I had some questions
Q1] Is the SilverKite method supposed to take any regressor(like Xgboost) and convert that to an AutoRegressor(based on lags)?
Q2]Is the reconcile method meant to explore Hierarchical Forecasting as per Hyndman's?

It will be great if I can be any help creating example notebooks around the reconcile and Silverkite.

Sample distribution

I wanted to ask whether there is a way to access the distribution of samples (i.e. the assumed model distribution). Reading through https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0900_uncertainty.html I gathered that Silverkite does some sampling to estimate the uncertainty intervals. Is there a way to access the samples directly, or even better a density over the possible values?

In Prophet, one can use predictive_samples to obtain the generated samples.

Tightly pinned dependencies

Hi,

Looking at the setup.py file, it looks like the following are all required runtime dependencies, all of which need to be pinned very precisely:

requirements = [    "Cython==0.29.23",    "cvxpy==1.1.12",    "fbprophet==0.5",    "holidays==0.9.10",  # 0.10.2,    "ipykernel==4.8.2",    "ipython==7.1.1",    "ipywidgets==7.2.1",    "jupyter==1.0.0",    "jupyter-client==6.1.5",    "jupyter-console==6.",  # used version 6 to avoid conflict with ipython version    "jupyter-core==4.7.1",    "matplotlib==3.4.1",    "nbformat==5.1.3",    "notebook==5.4.1",    "numpy==1.20.2",    "osqp==0.6.1",    "overrides==2.8.0",    "pandas==1.1.3",    "patsy==0.5.1",    "Pillow==8.0.1",    "plotly==3.10.0",    "pystan==2.18.0.0",    "pyzmq==22.0.3",    "scipy==1.5.4",    "seaborn==0.9.0",    "six==1.15.0",    "scikit-learn==0.24.1",    "Sphinx==3.2.1",    "sphinx-gallery==0.6.1",    "sphinx-rtd-theme==0.4.2",    "statsmodels==0.12.2",    "testfixtures==6.14.2",    "tornado==5.1.1",    "tqdm==4.52.0"]

My question is - why pin them so tightly, and are all of them really necessary? E.g. do I really need sphinx-gallery? Such tight pins make it very difficult to integrate into any existing project. Why not just require a lower bound for many/most of these?

Version without Facebook Prophet

Hello,

Thank you for open-sourcing this library and for your effort. It's great.

I would like to add Greykite as an available model for one library I am developing.
However, given that I already use Facebook Prophet, adding Greykite creates a series of conflicts (I use the latest Prophet version, etc).
I am interested in the Silverkite model; so, my suggestion is to add the possibility to install Greykite without the Prophet part. I don't know if it's possible, or if it is not in your plans.

Thank you again.
Alessandro

How is this different from fbprophet?

Hello all,

I couldn't find this out easily online but how is this different from fbprophet? It seems that fbrpophet + all of those dependencies is a requirement plus it provides plots that are nearly identical to prophet along with trend lines and error bounds. I was hoping to get a bit more into the algorithm and how it differs from other algorithms that detects trends, trend breaks, seasonality adjustments, and more. Basically, I would be looking for the research paper behind greykite and arguments about why those changes make it better than others.

I am very interesting is how the forecasts improved with this algorithm and a theoretical underpinning for this conclusion. I have no doubt that an algorithm built and designed by linkedin would work quite well on linked in data. I would love to read about additional examples and why greykite out performs these models.

Thanks!

uncertain futures

hi,
first, thank you for this amazing work!!!
Let's say I want to make some predictions over a future that is uncertain (e.g. my future might be based on 2 hypothesis, "France will win Euro", "France will not win Euro" 😥).
In that case, I have 2 data-frames with different regressors.
I can make independant predictions over each of these data-frames.
But I am interested into aggregating these two predictions (in real life, much more than 2) into one prediction, managing median value, uncertainty intervals, etc.
Is there a way to do that?
Thank you!
M.

Installing Greykite breaks Prophet

Installing Greykite in the same env as Prophet causes problems due to different versions of the Holidays package.
Greykite requires holidays==0.9.10, whereas Prophet requires holidays>=0.10.2. Is there any reason Greykite can't use the more recent holiday package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.