GithubHelp home page GithubHelp logo

jasonzelaya / insurance-forecast Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 5.38 MB

An analysis that predicts individual health insurance costs charged by health insurance companies based on age, sex, BMI, children, smoking, and region using predictive modeling and machine learning.

Jupyter Notebook 99.82% R 0.18%
machine-learning statistics data-science python data-analysis data-visualization r-programming jupyter-notebook pandas numpy

insurance-forecast's Introduction

Health Insurance Cost Forecast

-- Status: Completed

Purpose

The purpose of this analysis is to predict individual health insurance costs charged by health insurance companies based on age, sex, BMI, children, smoking, and region.

Methods Used

  • Supervised Machine Learning
  • Inferential Statistics
  • Descriptive Statistics
  • Machine Learning
  • Data Visualization
  • Predictive Modeling
  • Regression Analysis
  • Factor Analysis
  • Random Forest

Technologies

  • Python
  • R
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Matplotlib
  • Scikit-learn
  • Graphviz
  • Seaborn
  • Yellowbrick
  • Pydot

Needs of this project

  • Data exploration/descriptive statistics
  • Data processing/cleaning
  • Statistical modeling
  • Writeup/reporting

Data Source

Kaggle: https://www.kaggle.com/mirichoi0218/insurance

Data Content

  • Age: Age of the beneficiary in years.
  • Sex: Whether the beneficiary is male or female.
  • BMI: Body mass index derived from the weight and height of an individual. A healthy BMI is generally known to be from 18.5 to 24.9.
  • Children: Number of dependents covered by health insurance.
  • Smoker: Whether or not the beneficiary smokes.
  • Region: The beneficiary's residential area in the US. The categories are northeast, southeast, southwest, northwest.
  • Charges: The price the beneficiary pays the health insurance companies in USD.

**Note: The individual paying for the health insurance is referred to as the "beneficiary" in the definitions.

Underlying Assumptions

The model should conform to the assumptions of linear regression to be usable in practice. To confirm this we examined the data set to check:

  • The regression model is linear in parameters
  • The mean of residuals is zero
  • Homoscedasticity of residuals or equal variance
  • Normality of residuals

ML Algorithm

  • Multi-linear regression (supervised learning)
  • Pandas.crosstab categorical variable sex smoker region to confirm values
  • Check for typos
  • Dollars, round decimals
  • Range of age
  • Incorrect entries
  • Data validation = exploratory data analysis
  • Data validation = cleaning the data

Other Contributing Members

Contact

[email protected]

insurance-forecast's People

Contributors

cewtycats avatar jasonzelaya avatar skwc224 avatar

Watchers

 avatar

insurance-forecast's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.