GithubHelp home page GithubHelp logo

51-callahan-mlproj-realestate's Introduction

51-callahan-mlproj-realestate

Goal

This project seeks to create a price prediction mechanism for real estate listings using historical transaction data. A secondary goal is to idenify pricing markets based on property features and through data enrichment to reveal new insights for real estate markets. This analysis will focus on residential real estate.

Due to the challenges finding complete datasources this analysis will incorporate historical data to the extent it can be obtained. Currently data is available back to ~2001 in some markets and this project will seek to incorporate that data after normalizing for inflation. Though it is not possible to account for all market dynamics (and dysfunction) converting price data to 2022 dollars will provide more datapoints with which to train a prediction model.

Metrics:

Predicted Price

The model should predict the value (price) of an individual property based on the parameters of the property. These predictions can be checked for accuracy using RMSE as they will be decimal values. Predicting price is important as it will identify pricing opportunities for open listings that are too low. The business context is that these low priced values represent opportunities for my fictional real estate business. The model will be considered "good enough" when normalized RMSE is below 20%. Normalized RMSE is discussed more here: https://www.statology.org/what-is-a-good-rmse/

Market Clusters

Clustering accuracy in this case will be difficult since there will not be a ground-truth with which to compare the model. Instead, the insights from clustering will need to make sense given real world examples. For example, clusters should identify neighborhood boundaries not captured formally, such as with zip codes. Researchers could map these clusters against areas where known boundaries exist to see if the model can identify those boundaries. Examples include areas where physical obstacles (train tracks, highway) seperate neighborhoods of different demographics or areas of large homogenous political and economic demographics.

Appendix:

  1. https://www.kaggle.com/datasets?search=real+estate
  2. https://catalog.data.gov/dataset?q=Housing+for+sale&sort=score+desc%2C+name+asc&tags=housing
  3. https://realtyna.com/blog/what-are-best-public-sources-real-estate-data/

51-callahan-mlproj-realestate's People

Contributors

peter-callahan avatar daniel-schroeder-28 avatar

Watchers

 avatar Jesse Spencer-Smith avatar Charreau Bell, Ph.D. avatar

51-callahan-mlproj-realestate's Issues

Basic Exploratory Analysis (EDA)

Questions that should be addressed during EDA:

  1. Are the enough data points to perform a prediction in the geographical area of interest?
  2. What data cleaning tasks are necessary?
  3. Is missing data a problem for any particular columns?
  4. Do any basic patterns emerge that increase/decrease trust in the dataset?

Perform Train Test Split

Outline TTS percentile, include holdout data, and incorporate any other strategies to ensure a proper TTS.

Update data dictionary to use new data source

Initial datasource lacked sufficient detail to be of use. New data found, so will update data dict to reflect the new datasource and detail the specifics of each column (datatype, meaning, etc.)

Feature Engineering

Based on your findings from basic EDA, briefly describe your plan for feature engineering (e.g., what transformations do you plan to do on any of the features, do you plan to drop any features, etc). If you have multiple complex features or features that may require trial and error. Create one issue for each one of those features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.