GithubHelp home page GithubHelp logo

techieashish / house-prices-advanced-regression-techniques Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joannabroniarek/house-prices-advanced-regression-techniques

0.0 0.0 0.0 6.76 MB

This project consists in competing in the following Kaggle competition: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Jupyter Notebook 100.00%

house-prices-advanced-regression-techniques's Introduction

House-Prices-Advanced-Regression-Techniques

Joanna Broniarek

The goal of this repository was to provide an analysis for Kaggle's competition: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

kaggle-image

My best score on the Kaggle Leaderboard: 0.11329

Description

EDA and Data tidying:

  1. Removing columns that contain the same value in 100% ["Street", "Utilities"]
  2. Removing outliers : GrLivArea more than 4500.
  3. Improving values like Year more than 2017.
  4. Handling missing numerical values:
  • LotFrontage according to median in specific Neighborhood
  • With constant = 0 for : ['BsmtFinSF1', 'BsmtFinSF2', 'BsmtFullBath', 'BsmtHalfBath', "MasVnrArea"]
  • The rest of numerical columns (apart from point 5) with median.
  1. Transformation of some numerical features that are actually categorical: ['MSSubClass', 'OverallCond’]
  2. Handling missing categorical values. (specific for each feature)
  3. Transformation of skewed features:
  • SalePrice – log transformation
  • Other features with skeweness > 0.5 using BoxCox transformation
  • Transformation some categorical features (with specific order) into numerical

Feature Engineering:

  1. Feature Isgarage defined according to feature GarageArea (1 – if more than 0)
  2. Feature Isfireplace defined according to feature Fireplaces (if more than 0)
  3. Feature Ispool defined according to feature PoolArea (if more than 0)
  4. Feature Issecondfloor defined according to feature 2ndFlrSF (if more than 0)
  5. Feature IsOpenPorch defined according to feature OpenPorchSF (if more than 0)
  6. Feature IsWoodDeck defined according to feature WoodDeckSF (if more than 0)
  7. Feature TotalSqrtFeet defined as sum of GrLivArea and TotalBsmtSF
  8. Feature TotalBaths defined as BsmtFullBath + FullBath + BsmtHalfBath/2 + HalfBath/2.
  9. Feature Neighborhood (transformation into 0, 1, 2) according to statistics if specific Neighborhood is rather rich/poor or between them.
  10. One-Hot Encoding for categorical data

Modelization:

Scaling - RobustScaler

  1. Linear Regression
  2. LASSO model selection
  3. GradientBoostingRegressor
  4. XGBRegressor
  5. ElasticNet
  6. LGBMRegressor
  7. BaggingRegressor

Training:

  1. StackingCVRegressor on models: [Lasso, ElasticNet, XGB, LGBM]
  2. Weighted predictions 0.2ElasticNet + 0.25lasso + 0.15LGBM + 0.4StackedModels

Environment specification:

  • python 3.6.4
  • numpy 1.14.2
  • scipy 1.1.0rc1
  • seaborn 0.9.0
  • sklearn 0.20.1
  • pandas 0.22.0
  • sklearn 0.20.1
  • xgboost 0.72
  • lightgbm 2.2.2

house-prices-advanced-regression-techniques's People

Contributors

joannabroniarek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.