jcmeunier77 / prediction_api Goto Github PK

Python 0.23% CSS 1.96% Less 0.86% SCSS 0.87% JavaScript 0.05% HTML 0.14% Dockerfile 0.01% Jupyter Notebook 95.89%

prediction_api's Introduction

Prediction_API

To develop API that capitalizes on real-estate data to render the following functionalities :
1. modeling a house in 3D from lidar satellite images (geoTIFFs file) by only entering a home address. This part is an extension of a previous project
2. locating the house on a map by entering its address
3. making price forecast on the buildings (i.e. houses or apartment) according to multiple features (postal code, number of rooms, living space, surface area, etc.)
Te deploy the API on azure (using a.o. Docker and Travis)

Consolidate the knowledge in Python, specifically in : NumPy, Pandas, Sklearn, Matplotlib,...
To be able to search and implement new librairies
Consolidate knowledge of data science and machine learning algorithm for developping an accurate regression prediction model
To be able to construct the project with object-oriented programming (OOP)
To be able to implement the whole project - and make it functioning - through an API (using Flask)
To be able to deploy the API on a web based environment (in this case Azure)

The API to be deployed on a web based environment (e.g. Heroku, Azure, etc.)
Optimize your solution to have the result as fast as possible.
The API searches for as much information as possible on its own. (For example, area => cadastre) Better visualization
You provide a 3d representation of the house

All the work achieved was done during the BeCode's AI/data science bootcamp 2020-2021

Research and understand the term, concept and requirement of the project.
Discover new libraries that can serve the project purposes
Developing, using and testing machine learning algorithm (i.a. sklearn with linear, SVG, decision trees regression, XGBoost,...)

for 3D house reconstruction
for real-estate data
- Data collection was done in the context of a previous project whose aim was to develop a Scrapping Bot written in Python, to scrape data (50.000+) from real estate website "Zimmo.be", for a challenge given by Becode.

Data cleaning : including, a.o., removing outliers and features with to many missing values (>15%) and conducting multivariate feature imputation for the feature with less missing values (using sklearn.impute.IterativeImputer)
Features engineering : as location (postal code) are not readily amenable to be integrate in quantitative model - but has nonetheless a huge impact on real-estate price - a ranking index was compute based on the average house price for each entities in Belgium. As shown, this index demonstrates a good association with house prices and it seemed that its 3rd polynomials best explained the target (more than 25% of the 'house price' variance explained for this sole feature - based on r_square coefficient).

Features :
- type of building: house/apartment
- living area: square meters
- field's surface: square meters
- number of facades
- number of bedrooms
- garden: yes/no
- terrace: yes/no
- terrace area: square meters
- equipped kitchen: yes/no
- fireplace: yes/no
- swimming pool: yes/no
- state of the building: as new, just renovated, good, to refresh, to renovate, to restore (one hot encoding)
Target:
- House price: euros
Machine learning model:
- Multiple models using increasing number of features and based on various algorithm (i.a. linear, SVM, decision tree, XGBoost) were trained and evaluated.
- The best model was based on the XGBoost algorithm (n_estimators=700, max_depth= 4, learning_rate= 0.3) and provided an r_square coefficient of .82 on the train set and of .76 on the test set
- The best fitted model was save as a pickel file which was integrated in the API for price estimation
- Examples of python code for data manipulation and algorithms development are stored in the notebook folder of the current repository