GithubHelp home page GithubHelp logo

eng-jonathan / datascience_machinelearning_and_statisticalmodeling Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 7.51 MB

Queens College - Math 390.4/342W: Data Science via Machine Learning and Statistical Modeling (R)

machine-learning data-science r zillow prediction

datascience_machinelearning_and_statisticalmodeling's Introduction

Data Science via Machine Learning and Statistical Modeling


  • Prompt | Report
  • Creates a Mathematical Model to numerically estimate what makes a success marriage. It incorporates feature selection, data training methods, and possible output errors.

  • Prompt | Report | R Code
  • Uses Supervised Machine Learning to beat Zillow.com’s “zestimates”
  • Developed in R and incorporates data modeling and manipulation techniques such as data removal, munging, and imputation, and linear and forest regressions

Results:

✓ Random Forest Model predicts within $27,000. (The Average Home Price is $315,000)

✗ Model Requires more observations to accurately predict extrapolated data, however performs well within the included zipcodes.


Course Overview:

  • Syllabus
  • Philosophy of modeling and learning using data
  • Prediction via the ordinary linear model including orthogonal projections, sum of squares identity, R2 and RMSE
  • Polynomial and interaction regressions
  • Prediction with machine learning including neural nets (the perceptron), support vector machines and the tree methods CART, bagged trees and Random Forests
  • Probability estimation using logistic regression, asymmetric cost classifiers and the ROC / DET performance curves
  • Underfitting vs. overfitting and the bias-variance decomposition / tradeoff
  • Model validation including out of sample techniques such as cross validation and bootstrap validation
  • Correlation vs. causation, causal models, lurking variables and interpretations of linear model coefficients
  • Extrapolation
  • The R language will be taught formally from the ground and up as well as visualization using the ggplot library and manipulation using the dplyr and data.table libraries.

Incorporated Topics

  • Basic Probability Theory: axioms, conditional probability, in/dependence
  • Modeling with discrete random variables: Bernoulli, Hypergeometric, Binomial, Poisson, Geometric, Negative Binomial, Uniform Discrete and others
  • Expectation and variance
  • Modeling with continuous random variables: Exponential, Uniform and Normal
  • Frequentist confidence intervals and hypothesis testing for one-sample proportions
  • Basic visualization of data: plots, histograms, bar charts
  • Linear algebra: Vectors, matrices, rank, transpose
  • Programming: basic data types, vectors, arrays, control flow (for, while, if, else), functions

datascience_machinelearning_and_statisticalmodeling's People

Contributors

eng-jonathan avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.