GithubHelp home page GithubHelp logo

epxjk82 / airbnkey Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 1.0 30.44 MB

This capstone project aims to utilize machine learning to predict the earning power of a room rented out on Airbnb.

Python 13.69% JavaScript 0.98% HTML 2.79% Jupyter Notebook 82.53%
regression-models nlp boosting-algorithms bootstrapping python flask scikit-learn gradient-boosting feature-engineering imputation

airbnkey's Introduction

airbnKEY

Demo

Note on Confidentiality

This project was conducted in collaboration with Loftium (http://www.loftium.com/). Due to the confidentiality of the data and modeling methodology, only a subset of information is publicly available on this repository.

If you are a recruiter who requires full access to the project or product demo for evaluation purposes, please send me an inquiry at [email protected].

Project Description

Home affordability continues to be a major challenge in many metropolitan areas. With housing prices continuing to outpace wage growth, it has become increasingly difficult for prospective home buyers to fulfill the American dream.

Fortunately, homeowners now have the option to Airbnb spare rooms in their new homes to help pay for the mortgage.

This project aims to estimate the dollar benefit a homeowner can expect from a room in a home.

Repository Structure

  • app : Source files for flask web app deployment
  • src : Python source files for data exploration and modeling
  • walkthroughs : Jupyter notebook walkthroughs for regression modeling and natural language processing

CRISP-DM Workflow

Data Understanding

Data Sources:

  • Partner data
  • Airbnb listing data

Obtaining the data:

  • Web-scraping using selenium
  • Available APIs

Data Exploration

  • Building familiarity using Pandas and NumPy
  • Calculating statistics on dataset
  • Visualizations using matplotlib

Data Preparation

  • Cleaning data for null values
  • Managing outliers
  • Feature engineering
  • Joining datasets
  • Storing data on mongoDB

Modeling

This project will employ both supervised and unsupervised methods to maximize model performance based on mean squared error.

Gradient Boosting Regression

Use gradient boosting regression models to predict the the Airbnb income for a spare room for a given home.

The regression modeling process includes:

  • Estimation of income using least-squares loss function
  • Model iteration using feature sets of increasing complexity
  • Hyperparameter tuning using grid-search
  • Quantile regression to estimate 10% and 90% percentile for predictions
  • Repeated cross-validation to minimize overfitting of training set

NLP

Use unsupervised natural language processing methods to extract latent features (topics) from unstructured data (listing descriptions) on airbnb listing pages.

The NLP process includes:

  • Term-frequency and inverse document frequency (TF-IDF) vectorization
  • Non-negative matrix factorization of TF-IDF vectors to identify additive topics
  • Calculation of the weighted presence of each latent topic in individual documents
  • Customization of stop word dictionary for airbnb descriptions
  • Topic count and n-gram optimization

Relevant latent topics that are associated with expected income on airbnb are included as features in the regression model above.

Evaluation

To determine the optimal regression model, this project used Mean Squared Error as the key evaluation metric.

The evaluation process includes:

  • Bootstrapping to estimate distribution of error scores from each model
  • Evaluation of feature importance and partial dependency plots to identify influential features for additional feature engineering

Deployment

The final product is a basic web app using python, flask, javascript, and D3.

The app allows a user to look up expected daily income from a room based on various features.

See product_demo2.gif for a demo of the web app.

airbnkey's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

alabarga

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.