GithubHelp home page GithubHelp logo

second-multi-variable-regression's Introduction

Second-Multi-variable-regression

Boston House Price estimation according to Boston House price data-set in sklearn

Date: 2021-07-29
Author: Hamid Rashkiany
Description: This exercise include the Bosting House Dataset. This dataset involves 13 features and 506 samples. The dataset is availabel in scikit-learn.

The first step of data exploratory is visualizing data. we need to know how does the data is spread in our target. in other language, we need to know where does located each amount of data in our data. For example we would like to know how many houses with price of 200 is in our dataset. One way is check all house prices one by one and categorize the prices in diferent groups. the group of prices between 0-100, 100-200, 200-300 and etc. This will be imposibile for big dataset. The practical and possible method is draw one histogram plot of our data. Histogram plot can shows us the probabilty or frequency of data. It says the frequency of data point x (the number of repeataion) in our dataset is equal to which probability. For example if we plot the data histogram for bins with length of 10 in each intervals, the histogram will say the probability of data between 0-10 is eual to P1, the probability of data between 10-20 is equal to P2 and etc. There can be three different histograms for data as Normal distribution, Skewness and Kurtosis. You can find more about each distribution in:https://tekmarathon.com/2015/11/13/importance-of-data-distribution-in-training-machine-learning-models/ but it is important to know that Normal distribution is one of the most significant data distribution in machine learning models. All normal distributions are symmetric and have bell-shaped curves with a single peak (aka Gaussian Distribution). Creating a histogram on variable (variable values on Xaxis and its frequency on Yaxis) would get you a normal distribution. When the distribution is normal then it obeys 68-95-99.7% rule. Which means 68% of data points/observations fall within +1*(Standard Deviation) to -1*(Standard Deviation) of mean 95% of data points/observations fall within +2*(Standard Deviation) to -2*(Standard Deviation) of mean 7% of data points/observations fall within +13*(Standard Deviation) to -3*(Standard Deviation) of mean

second-multi-variable-regression's People

Contributors

hamidrashkiany avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.