In this Jupyter Notebook I investigate the feasibility of predicting systolic and diastolic blood pressure based on the National Health and Nutrition Examination Survey (NHANES) data set from 2013-2014 found on kaggle.com. The primary questions I wish to answer are:
- Which variables from the survey are most predictive empirically and do they correspond to the what mainstream literature identifies as key factors in blood pressure levels?
- Comparing scikit-learn’s SGDRegressor, MultiTaskLasso, and RandomForestRegressor, which regression model offers the best predictions on this data set?
- Does the best model perform well enough to serve as a possible supplementary or alternative way of “measuring” blood pressure?
download and install the Anaconda package manager.
Open a terminal in the root directory of this project and enter:
jupyter notebook