This is a Multivariate Analysis Project in R. Here in I've taken Life Expectancy dataset which covers the data of 193 countries. On it I have applied techniques such as cleaning, EDA, Principal Component Analysis, Factor Analysis, Cluster Analysis and Regression.
The dataset is taken from the Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries. The data is from year 2000-2015 for 193 countries.
Problem Statement :
The various factors affecting life expectancy like demographic variables, income composition, mortality rates, immunization, human development index, social and economic factors.
Questions to be addressed :
1. How does immunization affect life expectancy rate?
2. Which country should be given priority in order to improvise their life expectancy rate?
3. Does life expectancy has any correlation with eating habits, lifestyle, exercise, smoking or drinking alcohol?
4. What measures should a country take in order to increase its healthcare expenditure to improve its average lifespan?
Data set Dictionary :
Variable Name
Description
Datatype
Accepts Null Values
Country
Country Name
Object
N
Year
Year
Object
N
Status
Developed or Developing
Object
N
Life Expectancy
Life expectancy in age
Object
N
Adult Mortality
Probability of dying between 15 and 60 years per 1000 population
Object
N
infant deaths
Number of infant deaths per 1000 population
Object
N
Alcohol
recorded per capita consumption(in litres)
Object
N
percentage expenditure
Expendidture on health as per GDP(%)
Object
N
Hepatitis B
Immunization coverage among 1 year old(%)
Object
N
Measles
Number of reported cases per 1000 population
Object
N
BMI
Average BMI of entire population
Object
N
under-five deaths
Number of under five deaths per 1000 population
Object
N
Polio
Immunization coverage amoung one year olds(%)
Object
N
Total Expenditure
Government expenditure of health as a percentage of total govt. expenditure(%)