GithubHelp home page GithubHelp logo

irisgun / kaggle-m5-forecasting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kunalarora/kaggle-m5-forecasting

0.0 0.0 0.0 22.37 MB

Time-Series forecasting using Stats models, LightGBM & LSTM

Jupyter Notebook 100.00%

kaggle-m5-forecasting's Introduction

kaggle-m5-forecasting

To check the Kaggle competition, please go to following link https://www.kaggle.com/c/m5-forecasting-accuracy

Some general information regarding the competition:

The Makridakis competitions (or M-competitions), organised by forecasting expert Spyros Makridakis, aim to provide a better understanding and advancement of forecasting methodology by comparing the performance of different methods in solving a well-defined, real-world problem. The first M-competition was held in 1982. The forth competition (M4) ran in 2018 and featured “100,000 time series and 61 forecasting methods” (source in link). According to forecasting researcher and practitioner Rob Hyndman the M-competitions “have had an enormous influence on the field of forecasting. They focused attention on what models produced good forecasts, rather than on the mathematical properties of those models”. This empirical approach is very similar to Kaggle’s trade-mark way of having the best machine learning algorithms engage in intense competition on diverse datasets. M5 is the first M-competition to be held on Kaggle.

Goal

Teams have been challenged to predict sales data provided by the retail giant Walmart 28 days into the future. This competition will run in 2 tracks: In addition to forecasting the values themselves in the Forecasting competition, we are simultaneously tasked to estimate the uncertainty of our predictions in the Uncertainty Distribution competition. Both competitions will have the same 28 day forecast horizon.

Dataset

The dataset of the competition can be downloaded directly by following the competition link or it is available as m5-forecasting-accuracy.zip file in this github repo.

The data comprises 3049 individual products from 3 categories and 7 departments, sold in 10 stores in 3 states. The hierachical aggregation captures the combinations of these factors. For instance, we can create 1 time series for all sales, 3 time series for all sales per state, and so on. The largest category is sales of all individual 3049 products per 10 stores for 30490 time series.

The training data comes in the shape of 3 separate files:

sales_train.csv: this is our main training data. It has 1 column for each of the 1941 days from 2011-01-29 and 2016-05-22; not including the validation period of 28 days until 2016-06-19. It also includes the IDs for item, department, category, store, and state. The number of rows is 30490 for all combinations of 30490 items and 10 stores.

sell_prices.csv: the store and item IDs together with the sales price of the item as a weekly average.

calendar.csv: dates together with related features like day-of-the week, month, year, and an 3 binary flags for whether the stores in each state allowed purchases with SNAP food stamps at this date (1) or not (0).

Information regarding the python notebooks

M5-EDA_BasicStatsModels: This notebook do comprehensive Exploratory Data Analysis (EDA) for the given time-series database. It also contain various statistical time-series models implementation: Naive, Moving Average, Smooting Exponent(Holt, Exponential), SARIMAX & Prophet

M5-BasicLSTM: This notebook contains the implementation for RNN-LSTM to forecast time-series data.

M5-LightGBM: This notebook contains the implementation for Boosting technique LightGBM to forecast time-series data.

M5-StatsTimeSeriesBasics: This notebook contains the basics of how to identify trends and seasonality in time-series. It contains various method of removing seasonality and trends before applying into statistical models like ARIMA.

kaggle-m5-forecasting's People

Contributors

kunalarora avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.