Get ready to collecting data from ImmoWeb which is the most popular real estate website in Belgium? ๐ง๐ช
The main mission of the project is creating a dataset for a machine learning model to make a price predictions on real estatse sales in Belgium. Our dataset holds the following columns:
- Locality
- Type of property (House/apartment)
- Subtype of property (Bungalow, Chalet, Mansion, ...)
- Price
- Type of sale (Exclusion of life sales)
- Number of rooms
- Area
- Fully equipped kitchen (Yes/No)
- Furnished (Yes/No)
- Open fire (Yes/No)
- Terrace (Yes/No)
- If yes: Area
- Garden (Yes/No)
- If yes: Area
- Surface of the land
- Surface area of the plot of land
- Number of facades
- Swimming pool (Yes/No)
- State of the building (New, to be renovated, ...)
All these datasets are saved in property_data.csv
file.
While saving the datasets, we used python libraries ;
- python from bs4 import BeautifulSoup
- from selenium import webdriver
- import time
- import json
- import pandas as pd
Design and construction phase of the project was made by 4 collaborators.(Ujjwal Kandel, Reena Koshta, Nichelle Pinto Machado, Yusuf Akcakaya)
- Pull requests are welcome.
- or
git clone https://github.com/UjjwalKandel2000/challenge-collecting-data.git
challenge-collecting-data
โ
โ README.md :explains the project
โ property_data.csv :keeps all data for properties
โ
โ__
โ driver :directory contains chromedriver
โ โ
โ โ chromedriver :is a standalone server or a separate executable that is used by Selenium WebDriver to control Chrome.
โ__
โ immo_scraping :directory contains web_scraping.py
โ โ
โ โ web_scraping.py :Python script file for web scraping
โ
โ
- Out put of our dataset
- Repository:
challenge-collecting-data
- Type of Challenge:
Consolidation
- Duration:
3 days
- Deadline:
13/10/2021 16:30
- Team challenge : 3