GithubHelp home page GithubHelp logo

microsoft / forecasting Goto Github PK

View Code? Open in Web Editor NEW
2.6K 104.0 431.0 27.87 MB

Time Series Forecasting Best Practices & Examples

Home Page: https://microsoft.github.io/forecasting/

License: MIT License

Python 75.16% Shell 0.42% R 4.00% Batchfile 0.52% Jupyter Notebook 19.90%
forecasting time-series best-practices machine-learning deep-learning azure-ml automl demand-forecasting retail python

forecasting's Introduction

Forecasting Best Practices

Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively.

This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featurizing the data, optimizing and evaluating models, and scaling up to the cloud.

The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions. We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to the development of solutions by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.

Cleanup notice (2020-06-23)

We've carried out a cleanup of large obsolete files to reduce the size of this repo. If you had cloned or forked it previously, please delete and clone/fork it again to avoid any potential merge conflicts.

Content

The following is a summary of models and methods for developing forecasting solutions covered in this repository. The examples are organized according to use cases. Currently, we focus on a retail sales forecasting use case as it is widely used in assortment planning, inventory optimization, and price optimization. To enable high-throughput forecasting scenarios, we have included examples for forecasting multiple time series with distributed training techniques such as Ray in Python, parallel package in R, and multi-threading in LightGBM. Note that html links are provided next to R examples for best viewing experience when reading this document on our github.io page.

Model Language Description
Auto ARIMA Python Auto Regressive Integrated Moving Average (ARIMA) model that is automatically selected
Linear Regression Python Linear regression model trained on lagged features of the target variable and external features
LightGBM Python Gradient boosting decision tree implemented with LightGBM package for high accuracy and fast speed
DilatedCNN Python Dilated Convolutional Neural Network that captures long-range temporal flow with dilated causal connections
Mean Forecast (.html) R Simple forecasting method based on historical mean
ARIMA (.html) R ARIMA model without or with external features
ETS (.html) R Exponential Smoothing algorithm with additive errors
Prophet (.html) R Automated forecasting procedure based on an additive model with non-linear trends

The repository also comes with AzureML-themed notebooks and best practices recipes to accelerate the development of scalable, production-grade forecasting solutions on Azure. In particular, we have the following examples for forecasting with Azure AutoML as well as tuning and deploying a forecasting model on Azure.

Method Language Description
Azure AutoML Python AzureML service that automates model development process and identifies the best machine learning pipeline
HyperDrive Python AzureML service for tuning hyperparameters of machine learning models in parallel on cloud
AzureML Web Service Python AzureML service for deploying a model as a web service on Azure Container Instances

Getting Started in Python

To quickly get started with the repository on your local machine, use the following commands.

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.

  2. Clone the repository

    git clone https://github.com/microsoft/forecasting
    cd forecasting/
    
  3. Run setup scripts to create conda environment. Please execute one of the following commands from the root of Forecasting repo based on your operating system.

    • Linux
    ./tools/environment_setup.sh
    
    • Windows
    tools\environment_setup.bat
    

    Note that for Windows you need to run the batch script from Anaconda Prompt. The script creates a conda environment forecasting_env and installs the forecasting utility library fclib.

  4. Start the Jupyter notebook server

    jupyter notebook
    
  5. Run the LightGBM single-round notebook under the 00_quick_start folder. Make sure that the selected Jupyter kernel is forecasting_env.

If you have any issues with the above setup, or want to find more detailed instructions on how to set up your environment and run examples provided in the repository, on local or a remote machine, please navigate to the Setup Guide.

Getting Started in R

We assume you already have R installed on your machine. If not, simply follow the instructions on CRAN to download and install R.

The recommended editor is RStudio, which supports interactive editing and previewing of R notebooks. However, you can use any editor or IDE that supports RMarkdown. In particular, Visual Studio Code with the R extension can be used to edit and render the notebook files. The rendered .nb.html files can be viewed in any modern web browser.

The examples use the Tidyverts family of packages, which is a modern framework for time series analysis that builds on the widely-used Tidyverse family. The Tidyverts framework is still under active development, so it's recommended that you update your packages regularly to get the latest bug fixes and features.

Target Audience

Our target audience for this repository includes data scientists and machine learning engineers with varying levels of knowledge in forecasting as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world forecasting problems.

Contributing

We hope that the open source community would contribute to the content and bring in the latest SOTA algorithm. This project welcomes contributions and suggestions. Before contributing, please see our Contributing Guide.

Reference

The following is a list of related repositories that you may find helpful.

Deep Learning for Time Series Forecasting A collection of examples for using deep neural networks for time series forecasting with Keras.
Microsoft AI Github Find other Best Practice projects, and Azure AI designed patterns in our central repository.

Build Status

Build Branch Status
Linux CPU master Build Status
Linux CPU staging Build Status

forecasting's People

Contributors

angusrtaylor avatar chenhuims avatar dciborow avatar dependabot[bot] avatar eisber avatar hlums avatar hongooi73 avatar ilanr9 avatar ilanreiter avatar jash271 avatar pechyony avatar revodavid avatar sambaiz avatar vapaunic avatar yiychen avatar zhoufang928 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

forecasting's Issues

[ASK] update .gitignore file

Description

Update .gitignore file to include more unnecessary files e.g., AML config file, output from AML experiments, model files, etc

Other Comments

[ASK] Use fclib directly in aml examples

[BUG] azure_automl_forecast uses wrong workspace creation

Description

You should use this to get or create an existing workspace.

ws = Workspace.create(subscription_id=subscription_id, resource_group=resource_group, name=workspace_name,
create_resource_group=True, exist_ok=True, location=workspace_region)

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

train_validate_vm.sh fails in fnn submission

I followed instructions in README file of fnn submission and got two different failures in master and staging branches.

In master branch I get an error:
[1] "cv_round_1"
[1] 1
Error in { :
task 1 failed - "Variable 'subset_columns_train' is not found in calling scope. Looking in calling scope because you used the .. prefix."
Calls: %dopar% ->
Execution halted

In staging branch I get an error:
[1] "cv_round_1"
[1] 1
Error in [.data.table(validation_data, , c("recent_load_ratio_10", "recent_load_ratio_11", :
column(s) not found: recent_load_ratio_10, recent_load_ratio_11, recent_load_ratio_12, recent_load_ratio_13, recent_load_ratio_14, recent_load_ratio_15, recent_load_ratio_16
Calls: rowMeans -> is.data.frame -> [ -> [.data.table
Execution halted

[BUG] May add git clone in SETUP.md

Description

In SETUP.md, I think it will be clear to add before all commands:

git clone https://github.com/microsoft/forecasting.git
cd forecasting/

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

rewrite download_data.r of OrangeJuice dataset in Python

since Python is a main language of this repo, we should have this dataset accessible for people who are not familiar with R. Since the actual dataset is part of R package, download_data.py of this dataset can call download_data.r .

[BUG] yield in split_train_test() function

Description

split_train_test() function doesn't write out csv files when we call it with write_csv=True. I found this is because we use yield statement at the end of the function. The function only returns a generator every time we call it without actually executing the code inside the function. Right now, we need to iterator through the generator to force the function to be really executed, by doing something like

for train_df, test_df, aux_df in split_train_test(DATA_DIR, forecasting_setting, write_csv=True):

@vapaunic Do you think it is better to replace yield with returning lists of data frames train_df_list, test_df_list, aux_df_list when NUM_ROUNDS>1 and returning three data frames train_df, test_df, aux_df when NUM_ROUNDS=1?

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[BUG] Prediction HORIZON specified in forecast settings, but not used

Description

We specify a value PRED_HORIZON in forecast_setttings,py to be used as a forecasting horizon. However, we don't use this value when forecasting, or creating the train/test data splits. Rather, variables TRAIN(TEST)_START(END)_WEEK are used as a proxy for the prediction horizon.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[BUG] examples: oj_retail

Description

The examples directory has a subdirectory called oj_retail. The name probably could be optimized to better reflect the use case we are trying to highlight. Are there keywords we want to cover in here? retail? grocery? perishable goods? I would not think OJ really signifies anything here.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[FEATURE] Put jupyter startup instructions in setup

Description

I would recommend that the jupyter startup instructions, currently found in the first cell of examples/README.md, either be moved or copied into docs/SETUP.md. This seems to make sense, so that you can have jupyter running before you are instructed to start running the example notebooks. I guess this will require a new section under the "automated" and "manual" steps, one that instructs the user to start jupyter.

Expected behavior with the suggested feature

Setup instructions tells users to start jupyter before jumping into the example notebooks.

Other Comments

FYI, I'm running from a DSVM which has a running Jupyterlab on port 8888, so port 8889 was used for jupyter.

[FEATURE] Put "Get Started" instructions in README

Description

Having the get started instructions in README makes it easier for users to quickly set up the environment and do the experimentation with the repo.

Expected behavior with the suggested feature

Other Comments

[BUG] lightgbm could not be found in Jupyter

Description

I could import lightgbm in forecast env.
I could also import it while using python 3 kernel in jupyter.
However, I could not import it using forecast kernel in jupyter.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[BUG] two out of three notebooks empty on master.

Description

Only lightgbm notebook is a legit notebook. When opening the other two notebooks from browser, Jupyter gives error that they are not JSON but in fact they are of 0 bytes.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[FEATURE] Evaluation metrics to support any iterables

Description

Current metrics only support pd.series.
Would be nice to work w/ any kinds of iterables like np.array, list, etc.

Expected behavior with the suggested feature

E.g.,

def MAPE(predictions, actuals):
    predictions = np.array(predictions)
    actuals = np.array(actuals)
    
    # mape calculation here
    np.absolute(predictions-actuals)  ...

Implement single-model approach in R

Currently, the tidyverts framework only supports one model per subject, for datasets that consist of multiple subjects with one time series per subject. There is an open issue to support fitting a single model across all subjects, similar to what is being done here on the Python side.

[ASK] Add info to top-level README

Description

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

NA handling in orange juice dataset

Should add some discussion on this especially in R context; some modelling functions can handle them natively, others require imputation and can be fragile when NAs are present

[BUG] Pylint Score 6.54/10

Description

After running pylint the repo has a score of 6.54.

I keep my repo score at 10. But repo should be at least greater then 8 before release.

[BUG] Clean-up evaluation module in fclib

Description

Clean up evaluation module in fclib. There are left-over files there from the tsperf days (evaluate, train_util). Move these to contrib directory.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[BUG] `download_ojdata` does not work inside a Jupyter Notebook

Description

When running the 00_quick_start/auto_arima_forecasting.ipynb notebook in the cell where the data is downloaded and split, it failed to download the data.

For example, if we run the function to download the data, it says it starts to download the data but the actual download operation is not triggered (see screen shot below).

image

How do we replicate the bug?

Follow the environment set up instructions and run the notebook.

Expected behavior (i.e. solution)

The data should be successfully downloaded.

Other Comments

The problem may be something to do with the script path construction where os.path.abspath(__file__) is used - it might be somewhat incompatible with Jupyter notebook. One discussion that may be useful to resolve the issue is here.

[BUG] Forecast settings train/test weeks run past the end of the dataset

Description

The forecast settings will see the test data run past the end of the dataset.
TEST_START_WEEK goes up to 161, TEST_END_WEEK up to 162, but the data only goes to 160
TRAIN_END_WEEK goes up to 159 so there will be nothing left

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[BUG] forecasting or forecast

Description

The repo's name is forecasting.
There is a directory called forecasting_lib.
conda env is called forecast
In addition, the Jupyter kernel is called forecast.

I don't know if most people would prefer to have a single name (e.g., forecast). Would be good to interview users and make a call.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

[ASK] Group the essential command line prompts together to facilitate onboarding

Based on rounds of user testing, one typically only needs the following three lines of command to set up the environment:

git clone https://github.com/microsoft/forecasting.git
cd forecasting/
./tools/environment_setup.sh

Please consider grouping these commands in a single section for ease of reference and interpretation. I had two interviewees who didn't locate "./tools/environment_setup.sh" during the first pass of the setup guide.

[FEATURE] Link to a sample notebook from SETUP.md

Instead of vaguely mentioning the existence of an examples/ folder in SETUP.md, we may consider explicitly linking to a sample notebook such as examples/00_quick_start/auto_arima_forecasting.ipynb

This small hand-holding can go a long way of helping a new user develop their first forecasting use case and boost user satisfaction.

[BUG] Unable to execute "./tools/environment_setup.sh" in Windows Command Prompt

This is regarding the required command ./tools/environment_setup.sh as part of the environment setup process.

.sh/shell scripts are batch files for Linux/Unix. So Windows Users would either have to use Ubuntu Terminal or WSL (Windows subsystem for Linux) but the easiest way would be to either make a powershell .ps1 script or a Windows Batch .bat script.

https://www.thewindowsclub.com/how-to-run-sh-or-shell-script-file-in-windows-10
https://simply-python.com/2014/03/20/easy-invoke-pip-install-using-batch-commands/

[ASK] Include model training time in examples

Description

Add information about estimated running time in "Model training" cell and mention that user could reduce the number of iterations to speed up the model training.

How do we replicate the bug?

Expected behavior (i.e. solution)

Other Comments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.