Visualization of diabetes data using Box plot, histogram etc. Includes data normalization and handling missing data.
- Write a python program to calculate mean and max of each column. Then, appropriately handle the missing data a) remove all the entries with NA, b) replace all the NA with the mean of the column. Then, calculate mean and max of each column.
From this question onward, use the dataset with all the entries with NA removed.
-
Read the attached dataset and write only the Pregnancies, Glucose and Outcome columns of the dataset to a file called ‘pre_glu_outcome.csv’. Read the table in Panda’s DataFrame and find out the shape and size of the DataFrame. Also, print first 5 rows and last 5 rows.
-
Calculate the max, mean, standard deviation, median, 75 percentile of the Glucose column of the dataset.
-
Display histogram, bargraph of Glucose and Blood pressure column of this data. Also, plot the scatter plot between Glucose and Blood pressure column. Can you decipher and relationship between these two variables based on the scatter plot.
-
Plot the boxplot of each attributes (columns etc. the last column) and Perform minmax-scaling of the attached data and plot the boxplot after the scaling. You are only going to do it for the features not the labels (last column: outcome).