GithubHelp home page GithubHelp logo

advancedregression's Introduction

Advanced Regression Assignment

Project by Géraldine Bengsch (Third Upgrad Assignment)

This project uses Regularisation to build a polynomial regression model for the prediction of the actual value of the prospective properties and decide whether to invest in them or not.

  • Which variables are significant in predicting the price of a house
  • How well the variables describe the price of a house

The project contains:

  • Data Analysis notebook
  • A folder containing images used (Visualisations are my own, picture is from Unsplash)
  • The data set train.csv
  • The data dictionary datainfo.txt
  • Answers to the Subjective Questions in pdf format SubjectiveQuestions.pdf

Table of Contents

General Information

Photo by <a href="https://unsplash.com/@erol?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Erol Ahmed</a> on <a href="https://unsplash.com/s/photos/house-before-after?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>

  • General information: A US bike-sharing provider BoomBikes has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. The company is finding it very difficult to sustain in the current market scenario

  • Background: A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia.

  • Business problem: Aim is to model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

Assignment Steps performed in the notebook

Data visualisations

  • perform EDA to understand various variables
  • check correlation between the variables

Data preparation

  • clean the data structure
  • drop unneccessary variables
  • create dummy variables for all categorical features
  • divide the data to train and test
  • perform scaling
  • divide data into dependent and independent variables

Data modelling and evaluation

  • create linear regression model with no Regularisation
  • create models using Ridge and Lasso Regularisation
  • create additional models model using mixed approach (RFE & VIF/p-Value) and apply Ridge Regularisation
  • report the final model

Conclusions

Please see the notebook for more detailed insights.

  1. GrLivArea is by far the most important predictor
  2. The top variables are intuitive.
  3. Lasso is the chosen model for the final model, because it creates a simple model with the top features.

Technologies Used

Python NumPy Pandas Matplotlib Seaborn sklearn statsmodels scipy

Contact

Created by @GeriNZ - feel free to contact me!
Student at UpGrad: Master of Science in ML and AI
© 2022

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.