Data Analysis January 2019, Berlin, 2020-01-25
In this project we wanted to focus on a topic related to tourism. We chose to compare Airbnb listings with hotels in Berlin, Germany. As Airbnb and hence the number of available apartments through its service has grown tremendously over the last years, it has become an alternative for tourists/business travelers that need a place to stay overnight. Therefore, we compared hotels and Airbnb listings with each other to find differences/similarities between those two.
As written before, the number of Airbnb listings has grown over the last years. Today it seems that just based on the number of available rooms in Berlin, booking a room on Airbnb is a true alternative to booking a hotel room. The questions/assumptions we have are:
-
How is the number of Airbnb listings compared to the number of hotels throughout the city in the different areas? We assume that Airbnbs are more likely to be found in residential areas, where as hotels also cover business districts/more central districts.
-
How do the prices per area compare to each other? Do higher hotel prices in some areas also mean higher prices per night in an Airbnb apartment? Expensive hotels are most likely in areas with high rents, resulting in an increased average price per night for an Airbnb apartment in this area.
-
Over all, you might assume that the average price per night in an Airbnb apartment is less than the average price per night in a hotel as you get less service. But is that true? Our assumption is that the average price per night in an Airbnb apartment is less than the price per night in a hotel.
To analyze those questions, we will work with data from different sources that provide us with
- Price per night (for 2 persons)
- Area/district in Berlin for both - Airbnb listings and hotels.
We used the following three sources:
Airbnb listings - API - data include: Airbnb listings worldwide (we reduced it to Berlin), data include price per night, area, name of owner, geolocation, cleaning fee, no of persons, description of apartment and more:
Expedia listings - Web Scraping - data include: hotels in Berlin, price per night for 2 persons, area Method used: Selenium, scraping the following site:
Booking.com data - gathered through Octoparse - data include: hotels in Berlin, price per night for 2 persons, area
Link: Worked inside Octoparse app but narrowed results according to our specifications
We created three tables from the three different sources. Those tables have been merged after cleaning, as they were reduced to four columns so that the formats match:
- name (of hotel/ID of airbnb listing)
- price per night
- area
- source (airbnb/ booking.com/ expedia.de)
(+ in case of merging the two tables with hotel data, duplicates within those data have been removed previously)
The workflow was as follows:
- Definition of topic and gathering possible data sources
- Definition of questions that can be asked/topics that can be analyzed
- Comparing data sources and decision which data sources to use
- Extracting the data through API/Web scraping (if possible, strong data in GitHub repository)
- Cleaning data
- Merging data
- Running analysis with those data to gain insights (to our questions), incl. use of plots
- Preparing presentation based on our insights (Google slides)
- Finalizing folder and file structure
For communication we mainly used:
- Slack
After definition of topic we set up
- GitHub repository
- Kanban board (using Trello)
For gathering possible data sources everyone worked on his own. Extracting data required a lot of collaboration so that most of the time at least two person worked on the same topic/data source. After we had the data we split up the work, defined tasks, used Trello intensively.
The repository is set up as follows:
Merging, Analysis and Plotting
Booking.com - we used Octoparse, so no coding was needed
AirBnB (same file as for data sourcing)
Export from Octoparse:
Export from AirBnB data extracting and cleaning:
AirBnB pkl - used for final analysis
Export from Expedia data scraping:
Export from Expedia data cleaning:
Expedia data csv - used for final analysis
Export from Booking.com data cleaning:
Booking.com pkl - used for final analysis
chromedriver and chromedriver.exe - those files are needed to run the Expedia web scraper on Windows/Mac/Linux