GithubHelp home page GithubHelp logo

airbnb's Introduction

Notes on Airbnb business in New York and elsewhere

Disclaimer

The script scrapes the Airbnb web site to collect data about the shape of the company's business. No guarantees are made about the quality of data obtained using this script, statistically or about an individual page. So please check your results.

Changelog

2014-12-02 Tom Slee More robustness fixes.

2014-09-23 Tom Slee Bug fixes that solve problems where over-eager exception handling caused the script to exit too early.

2014-08-26 Tom Slee

Version 2.1 is updated to be able to collect data from Airbnb's updated web site. Not all cities have the new format, but the script should handle both versions. It will not, however, handle cities without neighborhoods.

2014-05-26 Tom Slee

Version 2 (May 2014) is much more thorough and efficient about searching Airbnb's web site for a given city and has more options. I have moved it to python 3 for better handling of unicode multi-lingual data. It is also ported to SAP SQL Anywhere to allow more flexible reporting and better concurrency than SQLite can provide. A free developer edition is available from the SAP web site. You may need to configure the python driver following the instructions given in http://dcx.sybase.com/index.html#sa160/en/dbprogramming/pg-python.html.

  • airbnb.py is the python script to collect data.
  • plot.py just produces some charts.

airbnb.db is the data. The basic data is in the table room. A complete search of a given city's listings is a "survey" and the surveys are tracked in table survey.

Using the script

To create the database: python airbnb.py -dbi. This command does two things:

  • initializes a database file (dbnb.db in the current directory)
  • runs the reload.sql script against the database to create the tables, views, and stored procedures that make up the database. No data is added.

On Windows, the reload.sql script does not always run. If that fails, try this to create the database tables:

  • dbisql -c "uid=dba;pwd=sql;dbf=dbnb.db;eng=db"
  • From Interactive SQL, click File > Open and choose reload.sql from the current directory. Hit F5 to execute the script and create the tables.

Test that you can connect to the database file: run python airbnb.py --dbping and confirm that there are no errors. If there are errors, check the database file setting near the top of the script and change its location.

To run a survey:

  • add a city (search area) to the database, by running ./airbnb.py -asa "city-name". It scans the Airbnb web site and adds the neighborhoods for the city.
  • add a survey to the database by running ./airbnb.py -asv "city-name". The command lists the survey_id value that was created.
  • collect the room_ids for the survey by running ./airbnb.py -s survey_id. The survey_id can be seen by running ./airbnb -ls. This search loops over neighborhoods, property types, and pages of listings in the Airbnb search pages.
  • fill in the details of the rooms by running ./airbnb -f.

If any step fails:

  • If the -s step or the -f step fails (say because the internet connection was lost), you can just run it again, and it will pick up from where it left off without losing data. Continue until the script completes.

airbnb's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.