Visualizing data using Python, matplotlib, and Seaborn
Jupyter Notebook 100.00%
plot-challenge's Introduction
Pymaceuticals, Inc.
Analysis
OBSERVED TREND 1: Capomulin was the only drug that displayed a downward trend with respect to tumor volume over time, with a decrease in tumor volume of 19% (tumor volume mean of 36.2 mm³ ± 1.2 after 45 days); the other three drugs displayed an upward trend. Placebo showed an increase in tumor volume of 51% (tumor volume mean of 68.1 mm³ ± 1.4 after 45 days.) Infubinol performed slightly better than placebo at +46% (tumor volume mean of 65.8 mm³ ± 1.1 after 45 days,) while Ketapril performed slightly worse than placebo at +57% (tumor volume mean of 70.7 mm³ ± 1.5 after 45 days.)
OBSERVED TREND 2: Capomulin showed the best results with respect to metastatic growth over time, with a mean of 1.48 metastatic sites, ± 0.2 after 45 days. Infubinol performed better than placebo, with a mean of 2.1 metastatic sites, ± 0.31 after 45 days. Placebo and Ketapril performed approximately the same, with means of 3.27 ± 0.3 and 3.36 ± 0.28 metastatic sites after 45 day, respectively. Coupled with the tumor volume over time results, it appears that Ketapril may be promoting tumor growth and metastasis rather than discouraging them, although the shape of the graph for Ketapril suggests that it may be suppressing metastasis early in treatment, as the number of metastatic sites did not catch up to and surpass those of the placebo group until the end of the 45 day test cycle.
OBSERVED TREND 3: With regard to survival rates, each drug group began the test cycle at timepoint 0 with 25 mice. Capomulin had the best survival rate; at the end of 45 days, 21 mice survived in the Capomulin group (84%.) Both the placbo and Ketapril groups had 11 survivors at the end of 45 days (44%) The group with the lowest survival rate was the Infubinol group, with 9 mice surviving at the 45 day timepoint (36%.) Of note in the Infubinol group was a marked downturn in survival beginning around timepoint 30; half the remaining mice in the group died between timepoints 25 and 45. This finding is concerning because in terms of tumor growth and metastasis over time, Infubinol was the second most promising drug in this analysis.
Initial ETL (Extract, Transform, and Load)
importos# os library
importnumpyasnp# numpy library
importpandasaspd# pandas library
importmatplotlib.pyplotasplt# pyplot module from matplotlib
importseabornassns# seaborn library
filename='clinicaltrial_data.csv'# Clinical trial data file
csv_clinical=os.path.join(".", "raw_data", filename) # Creates path to read data
clin_trial_df=pd.read_csv(csv_clinical) # Reads data from file
filename='mouse_drug_data.csv'# Mouse drug data file
csv_mousedrug=os.path.join(".", "raw_data", filename) # Creates path to read data
mou_drug_df=pd.read_csv(csv_mousedrug) # Reads data from file
# Combines data from the two dataframes -> Adds mouse data to clinical trial data on Mouse ID# Didn't worry about the one mouse that was treated with two drugs (g989), because mouse was # excluded from targeted drug list. Trap? Mistake? Pet that someone was worried about?combined_df=pd.merge(clin_trial_df, mou_drug_df, how='inner', left_on='Mouse ID', right_on='Mouse ID')
combined_df.sort_values(by=['Timepoint'], inplace=True) # Sorts dataframe on Timepoint
combined_df=combined_df.reset_index(drop=True) # Resets index on sorted dataframe
targeted_only_df=combined_df[(combined_df["Drug"] =='Capomulin') |# Dataframe for targeted drugs only
(combined_df["Drug"] =='Infubinol') |# Pipe (|) represents logical or
(combined_df["Drug"] =='Ketapril') |
(combined_df["Drug"] =='Placebo')
]
Tumor Response to Treatment
# Create two-column groupby dataframe for meansvals_trt_df=pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).mean()['Tumor Volume (mm3)'])
# Create two-column groupby dataframe for standard error of meanserrs_trt_df=pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).sem()['Tumor Volume (mm3)'])
# Create two-column groupby dataframe for meansvals_mrt_df=pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).mean()['Metastatic Sites'])
# Create two-column groupby dataframe for standard error of meanserrs_mrt_df=pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).sem()['Metastatic Sites'])
foriteminvals_mrt_df.columns: # Creates plots for each drugplt.errorbar(x_axis,
vals_mrt_df[item],
errs_mrt_df[item],
linestyle='--',
marker=markers[counter], # Defined in previous sectioncolor=colors[counter],
capthick=1,
capsize=3)
counter+=1
# Counts mice through study and displays dataframesurvivors_df=targeted_only_df.groupby(['Drug', 'Timepoint']).count()['Mouse ID']
survivors_df=survivors_df.unstack(level='Drug')
survivors_df
x_axis=survivors_df.index# Sets x axis to df index
counter=0# Counter for iterable items in loop
forsurvivorinsurvivors_df: # Creates plots for each drugy_values=survivors_df[survivor]/survivors_df.loc[0,survivor] *100# Calculates survival rate per timepointplt.plot(x_axis,
y_values,
linestyle='--',
marker=markers[counter], # Defined in previous sectioncolor=colors[counter]
)
counter+=1
plt.title('Tumor Change Over 45 Day Treatment', fontdict= {'fontsize': 16}) # Sets plot optionsplt.ylabel('% Tumor Volume Change')
plt.axhline(y=0, color='black') # Adds horizontal line @ 0xlabels=summary.indexplt.xticks(np.arange(len(xlabels)), xlabels) # Creates labels from drug names
([<matplotlib.axis.XTick at 0xc383f28>,
<matplotlib.axis.XTick at 0xc383b00>,
<matplotlib.axis.XTick at 0xc371358>,
<matplotlib.axis.XTick at 0xc3a0c50>],
<a list of 4 Text xticklabel objects>)
plt.bar(np.arange(4), # Creates plots for each drugsummary,
# Colors each bar based on positive or negative valuescolor= ['red'ifsummary[value] >0else'green'forvalueinnp.arange(len(xlabels))]
)
<Container object of 4 artists>
counter=0# Counter for iterable items in loop
forvalueinsummary: # Creates values in plotifvalue<0: # Positions values in ploty_point=-6.5else:
y_point=5plt.text(counter, y_point, str(int(value)) +'%', ha='center', color='white')
counter+=1