GithubHelp home page GithubHelp logo

challenge-collecting-data's Introduction

WEB SCRAPING

Get ready to collecting data from ImmoWeb which is the most popular real estate website in Belgium? ๐Ÿ‡ง๐Ÿ‡ช

giphy

The main mission of the project is creating a dataset for a machine learning model to make a price predictions on real estatse sales in Belgium. Our dataset holds the following columns:

  • Locality
  • Type of property (House/apartment)
  • Subtype of property (Bungalow, Chalet, Mansion, ...)
  • Price
  • Type of sale (Exclusion of life sales)
  • Number of rooms
  • Area
  • Fully equipped kitchen (Yes/No)
  • Furnished (Yes/No)
  • Open fire (Yes/No)
  • Terrace (Yes/No)
    • If yes: Area
  • Garden (Yes/No)
    • If yes: Area
  • Surface of the land
  • Surface area of the plot of land
  • Number of facades
  • Swimming pool (Yes/No)
  • State of the building (New, to be renovated, ...)

All these datasets are saved in property_data.csv file.

While saving the datasets, we used python libraries ;

  • python from bs4 import BeautifulSoup
  • from selenium import webdriver
  • import time
  • import json
  • import pandas as pd

Collaborators

Design and construction phase of the project was made by 4 collaborators.(Ujjwal Kandel, Reena Koshta, Nichelle Pinto Machado, Yusuf Akcakaya)

Installation

  • Pull requests are welcome.
  • or git clone https://github.com/UjjwalKandel2000/challenge-collecting-data.git

Repo Architecture

challenge-collecting-data
โ”‚
โ”‚   README.md           :explains the project
โ”‚   property_data.csv   :keeps all data for properties
โ”‚   
โ”‚__   
โ”‚   driver              :directory contains chromedriver
โ”‚   โ”‚
โ”‚   โ”‚ chromedriver      :is a standalone server or a separate executable that is used by Selenium WebDriver to control Chrome.
โ”‚__ 
โ”‚   immo_scraping       :directory contains web_scraping.py
โ”‚   โ”‚
โ”‚   โ”‚ web_scraping.py   :Python script file for web scraping
โ”‚   
โ”‚   

Visuals

  • Out put of our dataset

Screenshot 2021-10-13 at 15 54 27

Timeline

  • Repository: challenge-collecting-data
  • Type of Challenge: Consolidation
  • Duration: 3 days
  • Deadline: 13/10/2021 16:30
  • Team challenge : 3

Good Luck!

challenge-collecting-data's People

Contributors

reenakoshta10 avatar yusufakcakaya avatar ujjwalk00 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.