GithubHelp home page GithubHelp logo

loulai / zestimate Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 8 KB

A $1.2 million dollar competition to estimate home values. Run by Zillow and Kaggle.

Home Page: https://www.kaggle.com/c/zillow-prize-1

R 100.00%
predictive-modeling predictive-analytics zillow zestimate competition data-science data big-data

zestimate's Introduction

$1.2 Million Zestimate Competition

Zillow’s Home Value Prediction Competition

Course Name: Predictive Analytics | Section: CSCI-GA 3033-011 | Instructor: Prof. Anasse Bari | Semester: Fall 2017 | Team Members: Bofeng Hu ([email protected]), Yuanxu Wu ([email protected]), Guangli Jiang ([email protected]), Mengyao Zhu ([email protected]), Louise Y. Lai ([email protected]), Ziwei Wang ([email protected])

Abstract

Real estate is almost the most expensive thing people may purchase in his or her lifetime, so evaluating the house price could always affect people's decisions in their life. In order to ensure that homeowners have a trusted way to monitor this asset, Zillow offers Zestimate to help consumers make decisions with multiplied information about homes and housing market. In this project, we use valid home values data provided by Zillow to develop an algorithm that makes predictions about future sale price of homes and push the accuracy of Zestimate even further.

For decades, researchers have been looking for ways to estimate and predict housing prices. Zillow uses data including physical attributes, tax assessments, and prior transactions to estimate the value of a house (Zestiamate)[1], and aggregate individual Zestimate to denote the neighborhoods . To predict the future price of a house [2], Zillow first predicts the price of 12 month later using the forecast of the whole county [3] while adding individual characteristics of the house, then draw a path between the two points using a cubic spline. The whole purpose is to predict the log-error between the Zestimate the actual sales price, we can select multiple features from the housing data and apply regression models on them. We assume that house prices are based on the combination of a latent land desirability and a regression of house attributes. First, cluster algorithms such as K-means may be chosen to get the several different classes of houses based on the spatial spots. Then several single regression models as listed below are built to predict the house prices based on other houses attributes. 1) Gradient Boost Trees(GBT) algorithms that is a classic regression tree model, which obtain predictive results through gradually improved estimations. 2) Logistic Regression(LR) algorithms [4] which measures the relationship by converting the dependent variable to probability scores. 3) Neural Network(NN) or Artificial Neural Network(ANN) [7] model which are usually used to model complex relationships between inputs and outputs or to find patterns in data. 4) Support Vector Regression (SVR) algorithm [6] which are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. 5) K-nearest Neighbors algorithm (k-NN) [5] which is a simple algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions). In the last, ensemble modeling will be chosen to get the least loss error.

The data sets we obtained from kaggle.com consist of one training and two test sets with full lists of real estate properties in three counties (Los Angeles, Orange and Ventura, California) data. The train data has all the transactions before October 15, 2016. The first test data has the transactions between October 15, 2016 and December 31, 2016 and the second has all the properties in October 15, 2017, to December 15, 2017.

Our team is consisted of six members: Bofeng Hu, Guangli Jiang, Louise Lai, Mengyao Zhu, Yuanxu Wu, Ziwei Wang. Our tentative time list as follow: Week 1-2: Business understanding (Full-TEAM): Learning the business domain of project, Framing the problem, Developing initial analytics hypotheses; Week 3-4: Data understanding (Full-TEAM): Defining data storage and analytics paradigms, Predictive models; Week 5-8: Data Preparation (ZHU / LAI / WANG): Preparing the analytics server for project, Designing, implementing and documenting ETL jobs (Extract Transfer Load) , Conditioning data, Reducing the dimensionality of the data sets; Week 9-11: Modeling (HU / JIANG / WU / LAI) : Feature selection, Iterating over models and model selection; Week 12: Model Evaluation and Deployment (HU / JIANG / WU / LAI); Week 13: Results Presentation (Full-TEAM).

References

[1] https://www.zillow.com/zestimate/ [2] https://www.zillow.com/research/zestimate-forecast-methodology/ [3] https://www.zillow.com/research/zillow-home-value-forecast-methodology-2-3740/ [4] Freedman, David A. Statistical models: theory and practice. Cambridge university press, 2009. [5] Altman, Naomi S. "An introduction to kernel and nearest-neighbor nonparametric regression." The American Statistician 46.3 (1992): 175-185. [6] http://chem-eng.utoronto.ca/~datamining/dmc/support_vector_machine_reg.htm [7] https://en.wikipedia.org/wiki/Artificial_neural_network

zestimate's People

Contributors

loulai avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.