GithubHelp home page GithubHelp logo

istanbul_house_rent_prediction's Introduction

Istanbul House Rent Prediction

newplot

The aim of this project is to analyze the factors behind house rents in Istanbul and to predict the rents using regression techniques. The metric used is MAE.

Data Colection

The data were scraped from hepsiemlak.com at the end of January 2023. The data used for analysis may not be representative of the current situation anymore.

Preprocessing

  • The distribution of rents is quite skewed and it contains large outliers. These were dropped by setting a threshold on the target before the analysis.
  • To deal with a possible problem of rare categories and high cardinality, columns konut_tipi, isinma_tipi, yapi_tipi, yapinin_durumu, yakit_tipi and kullanim_durumu were binned. The rule of binning was based both on domain knowledge and data-specific aspects of the features. The categories underwent hierarchical clustering based on their mean target values and the presumption was in favor of the ones that were close to each other to be in the same bin. Yet, the final decisions were made based on domain knowledge.
  • Feature site_icerisinde had a large cardinality and each of its subsets had small numbers of observations. It was thus transformed into a binary feature where the observations that initially contained information about this feature are now True and the rest of them -i.e. the None ones- are False.
  • cephe column had a structure like [kuzey, güney, batı], [batı, güney], [], [doğu, kuzey]... (with four unique values) and this was extended into four separate binary features. Then a new column called cephe_sayisi were added based on the lengths of the above lists.
  • bulundugu_kat contained both continuous and categorical characters. It was treated as a continuous feature and the categories that can be interpreted as continuous were transformed into numbers using domain knowledge. The remaining ones were still hard to bin in each other. Therefore, they were set to None and left to the merit of tree-based models.

Models

Initially, four regressors were trained: xgboost, lightgbm, random forest and support vector regressor. The one that performed best based on MAE was xgboost. Then an ensemble model was trained with the stacking technique. The entire process was done with 5-fold cross validation:

  • The dataset was initially splitted into train and test sets.
  • The train set was further splitted into five folds for the cv procedure.
  • For each of the five folds in the train set, the base models were trained on the rest of the four folds and they made predictions on the validation set in hand. After the cv procedure, they were fitted on the whole train data and then predicted the test set.
  • Then, the final model was trained on the train set with the predictions of the base models. It was then validated on the test set.
  • The ensemble model enhanced the overall performance slightly.

istanbul_house_rent_prediction's People

Contributors

utkuboyar avatar cengizakr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.