GithubHelp home page GithubHelp logo

paulapipkin / sales_predictions Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.43 MB

Predicting the sales of various items for different stores

Jupyter Notebook 100.00%
data-science data-viz decision-trees linear-regression tree-model

sales_predictions's Introduction

Sales Prediction

By Paula Pipkin

The business is trying to predict sales:

Based on features of the items being sold and some relevant features of the stores selling those items, we will try to predict future sales.

Data:

The data set source is the website Analytics Vidhya and here is a link to the original data: https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/

Methods

  • After some pre cleaning, two copies of the data were made so EDA could be done without risking leakage.
  • EDA was used to show correlations between some relevant features
  • Three models were applied, Linear Regression, Decision Tree and Random Forrest

Results

Heat Map for numerical Features

sample image

The best correlation with the target is with Max retail price.

Sales of 2 different items and their respectives prices

sample image

Even though MRP has the best correlation to Sales, it still does not explain why for the items above the sales do not vary according to the different prices.

Showing metrics for testing data:

Model RMSE R2
Linear Regression 35849625523401.445312 -465823207696583688192.0
Decision Tree 1464.4947 0.5886
Random Forrest 1101.6168 0.5988

Recommendations, Limitations & Next Steps:

  • With a negative R2 score, you can safely assume that the model is a very poor fit to the data and a super large RMSE will only confirm that. So based on the metrics results I wouldn't suggest a linear regression model to this data.

  • The Decision Tree and the Random Forrest yielded better metrics values, it is far from perfect, specially if we anallize the RMSE that is showing great difference between training and test data. But the R2 is around 60%, which indicates that more relevant features should be add to the data if we want to predict the sales with confidence, still, the model can at this point explain 60% of the variance on the results and among the 3 I would recomend Decision Tree over Linear Regression and since 'Item Identifier' was dropped for the process of the Random Forrest, I will follow up with the Decision Tree, because this model consider more features that I find relevant

For further information

For any additional questions, please contact [email protected]

sales_predictions's People

Contributors

paulapipkin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.