GithubHelp home page GithubHelp logo

wskwon / airbnb_scraping Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adodd202/airbnb_scraping

0.0 1.0 0.0 1.22 MB

Scraping Airbnb with Scrapy Splash and performing EDA in Python and R.

Jupyter Notebook 98.32% R 0.52% Python 1.16%

airbnb_scraping's Introduction

Airbnb_Scraping

Scraping Airbnb with Scrapy Splash and performing EDA in Python and R.

Presentation can be found here: https://docs.google.com/presentation/d/1dShCpHl9UuVHzVdXqi681p0b0J8rCQKT6qhkjXF5H_8/edit?usp=sharing


File structure:

  1. airbnb folder: scraper using scrapy splash
  2. Python_EDA.ipynb:_ explatory data analysis in Python (and some machine learning), wrap it up for R use
  3. R_EDA.R:_ explatory data analysis in R

Project Overview:

This project scrapes Manhattan Airbnb locations via Scrapy Splash. It scrapes at a rate of ~500 listings per hour. The main scraping speed bottleneck is due to Airbnb banning scraping exceeding scraping faster than one page per 6 seconds.

The exploratory data analysis (EDA) begins in Python with some data cleaning and visulization of data correlation and histograms.

I attempted to fit a random forest to my data to predict Airbnb listing prices in Python and then tried again in R. Though at times I saw a mean squared error (MSE) on my model of 2500, the error was usually closer to 10,000. I did not understand what changed between the two models, though it may have been related to removing outlier values (high rent homes). I was particularly interested in seeing the effect of longitude and latitude (i.e.: location) on the price so I experimented extensively with modeling this and graphing it on a map in R. I used KNN to show trends better but still did not see any correlations. I believe this would be something to explore further with a larger data set (values and parameters with better feature engineering) and/or better methods.

In the actual code, I go from Python in EDA to R in EDA (and location modeling as previously mentioned). Finally, I end with some more R EDA showing the effect of variables such as room number on price.

Thanks for reading.

Andrew

airbnb_scraping's People

Contributors

adodd202 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.