GithubHelp home page GithubHelp logo

tcg85 / airbnb-hotel-in-berlin Goto Github PK

View Code? Open in Web Editor NEW

This project forked from valsophie/airbnb-hotel-in-berlin

0.0 1.0 0.0 15.01 MB

Group Project Ironhack Week 3 - Hotel vs. AirBnB

Jupyter Notebook 100.00%

airbnb-hotel-in-berlin's Introduction

Ironhack Logo

Around the corner

Data Analysis January 2019, Berlin, 2020-01-25

Content

Project Description

In this project we wanted to focus on a topic related to tourism. We chose to compare Airbnb listings with hotels in Berlin, Germany. As Airbnb and hence the number of available apartments through its service has grown tremendously over the last years, it has become an alternative for tourists/business travelers that need a place to stay overnight. Therefore, we compared hotels and Airbnb listings with each other to find differences/similarities between those two.

Questions & Hypotheses

As written before, the number of Airbnb listings has grown over the last years. Today it seems that just based on the number of available rooms in Berlin, booking a room on Airbnb is a true alternative to booking a hotel room. The questions/assumptions we have are:

  1. How is the number of Airbnb listings compared to the number of hotels throughout the city in the different areas? We assume that Airbnbs are more likely to be found in residential areas, where as hotels also cover business districts/more central districts.

  2. How do the prices per area compare to each other? Do higher hotel prices in some areas also mean higher prices per night in an Airbnb apartment? Expensive hotels are most likely in areas with high rents, resulting in an increased average price per night for an Airbnb apartment in this area.

  3. Over all, you might assume that the average price per night in an Airbnb apartment is less than the average price per night in a hotel as you get less service. But is that true? Our assumption is that the average price per night in an Airbnb apartment is less than the price per night in a hotel.

Dataset

To analyze those questions, we will work with data from different sources that provide us with

  1. Price per night (for 2 persons)
  2. Area/district in Berlin for both - Airbnb listings and hotels.

We used the following three sources:

Airbnb listings - API - data include: Airbnb listings worldwide (we reduced it to Berlin), data include price per night, area, name of owner, geolocation, cleaning fee, no of persons, description of apartment and more:

Link to used API

Expedia listings - Web Scraping - data include: hotels in Berlin, price per night for 2 persons, area Method used: Selenium, scraping the following site:

Link used to scrape Expedia

Booking.com data - gathered through Octoparse - data include: hotels in Berlin, price per night for 2 persons, area

Link: Worked inside Octoparse app but narrowed results according to our specifications

Database

We created three tables from the three different sources. Those tables have been merged after cleaning, as they were reduced to four columns so that the formats match:

  1. name (of hotel/ID of airbnb listing)
  2. price per night
  3. area
  4. source (airbnb/ booking.com/ expedia.de)

(+ in case of merging the two tables with hotel data, duplicates within those data have been removed previously)

Workflow

The workflow was as follows:

  1. Definition of topic and gathering possible data sources
  2. Definition of questions that can be asked/topics that can be analyzed
  3. Comparing data sources and decision which data sources to use
  4. Extracting the data through API/Web scraping (if possible, strong data in GitHub repository)
  5. Cleaning data
  6. Merging data
  7. Running analysis with those data to gain insights (to our questions), incl. use of plots
  8. Preparing presentation based on our insights (Google slides)
  9. Finalizing folder and file structure

Organization

For communication we mainly used:

  1. Slack

After definition of topic we set up

  1. GitHub repository
  2. Kanban board (using Trello)

For gathering possible data sources everyone worked on his own. Extracting data required a lot of collaboration so that most of the time at least two person worked on the same topic/data source. After we had the data we split up the work, defined tasks, used Trello intensively.

The repository is set up as follows:

Output folder:

Merging, Analysis and Plotting

Readme

Subfolder with following content:

1. Data Sourcing:

AirBnB

Expedia

Booking.com - we used Octoparse, so no coding was needed

2. Data Cleaning:

AirBnB (same file as for data sourcing)

Expedia

Booking.com

3. Export files (.csv or .pkl):

Export from Octoparse:

Octoparse csv - see above, this is the same file as mentioned for data sourcing from Booking.com data

Export from AirBnB data extracting and cleaning:

AirBnB csv

AirBnB pkl - used for final analysis

Export from Expedia data scraping:

Expedia results csv

Export from Expedia data cleaning:

Expedia data csv - used for final analysis

Export from Booking.com data cleaning:

Booking.com csv

Booking.com pkl - used for final analysis

Further files in folder:

chromedriver and chromedriver.exe - those files are needed to run the Expedia web scraper on Windows/Mac/Linux

Links

Repository

Slides

Trello

airbnb-hotel-in-berlin's People

Contributors

senonino avatar tcg85 avatar valsophie avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.