GithubHelp home page GithubHelp logo

plot-challenge's Introduction

Pymaceuticals, Inc.

Analysis

  • OBSERVED TREND 1: Capomulin was the only drug that displayed a downward trend with respect to tumor volume over time, with a decrease in tumor volume of 19% (tumor volume mean of 36.2 mm³ ± 1.2 after 45 days); the other three drugs displayed an upward trend. Placebo showed an increase in tumor volume of 51% (tumor volume mean of 68.1 mm³ ± 1.4 after 45 days.) Infubinol performed slightly better than placebo at +46% (tumor volume mean of 65.8 mm³ ± 1.1 after 45 days,) while Ketapril performed slightly worse than placebo at +57% (tumor volume mean of 70.7 mm³ ± 1.5 after 45 days.)
  • OBSERVED TREND 2: Capomulin showed the best results with respect to metastatic growth over time, with a mean of 1.48 metastatic sites, ± 0.2 after 45 days. Infubinol performed better than placebo, with a mean of 2.1 metastatic sites, ± 0.31 after 45 days. Placebo and Ketapril performed approximately the same, with means of 3.27 ± 0.3 and 3.36 ± 0.28 metastatic sites after 45 day, respectively. Coupled with the tumor volume over time results, it appears that Ketapril may be promoting tumor growth and metastasis rather than discouraging them, although the shape of the graph for Ketapril suggests that it may be suppressing metastasis early in treatment, as the number of metastatic sites did not catch up to and surpass those of the placebo group until the end of the 45 day test cycle.
  • OBSERVED TREND 3: With regard to survival rates, each drug group began the test cycle at timepoint 0 with 25 mice. Capomulin had the best survival rate; at the end of 45 days, 21 mice survived in the Capomulin group (84%.) Both the placbo and Ketapril groups had 11 survivors at the end of 45 days (44%) The group with the lowest survival rate was the Infubinol group, with 9 mice surviving at the 45 day timepoint (36%.) Of note in the Infubinol group was a marked downturn in survival beginning around timepoint 30; half the remaining mice in the group died between timepoints 25 and 45. This finding is concerning because in terms of tumor growth and metastasis over time, Infubinol was the second most promising drug in this analysis.

Initial ETL (Extract, Transform, and Load)

import os                                                                    # os library
import numpy as np                                                           # numpy library
import pandas as pd                                                          # pandas library
import matplotlib.pyplot as plt                                              # pyplot module from matplotlib
import seaborn as sns                                                        # seaborn library
filename = 'clinicaltrial_data.csv'                                          # Clinical trial data file
csv_clinical = os.path.join(".", "raw_data", filename)                       # Creates path to read data
clin_trial_df = pd.read_csv(csv_clinical)                                    # Reads data from file
filename = 'mouse_drug_data.csv'                                             # Mouse drug data file
csv_mousedrug = os.path.join(".", "raw_data", filename)                      # Creates path to read data
mou_drug_df = pd.read_csv(csv_mousedrug)                                     # Reads data from file
# Combines data from the two dataframes -> Adds mouse data to clinical trial data on Mouse ID
# Didn't worry about the one mouse that was treated with two drugs (g989), because mouse was 
# excluded from targeted drug list. Trap? Mistake? Pet that someone was worried about?
combined_df = pd.merge(clin_trial_df, mou_drug_df, how='inner', left_on='Mouse ID', right_on='Mouse ID')
combined_df.sort_values(by=['Timepoint'], inplace=True)                      # Sorts dataframe on Timepoint
combined_df = combined_df.reset_index(drop=True)                             # Resets index on sorted dataframe
combined_df.head()                                                           # Displays first 5 rows
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Mouse ID Timepoint Tumor Volume (mm3) Metastatic Sites Drug
0 b128 0 45.0 0 Capomulin
1 i635 0 45.0 0 Propriva
2 g791 0 45.0 0 Ramicane
3 w746 0 45.0 0 Propriva
4 r107 0 45.0 0 Propriva
targeted_only_df = combined_df[(combined_df["Drug"] == 'Capomulin') |        # Dataframe for targeted drugs only
                               (combined_df["Drug"] == 'Infubinol') |        # Pipe (|) represents logical or
                               (combined_df["Drug"] == 'Ketapril') |
                               (combined_df["Drug"] == 'Placebo')
                              ]

Tumor Response to Treatment

# Create two-column groupby dataframe for means
vals_trt_df = pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).mean()['Tumor Volume (mm3)'])
# Create two-column groupby dataframe for standard error of means
errs_trt_df = pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).sem()['Tumor Volume (mm3)'])
vals_trt_df.head()                                                           # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Tumor Volume (mm3)
Drug Timepoint
Capomulin 0 45.000000
5 44.266086
10 43.084291
15 42.064317
20 40.716325
errs_trt_df.head()                                                           # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Tumor Volume (mm3)
Drug Timepoint
Capomulin 0 0.000000
5 0.448593
10 0.702684
15 0.838617
20 0.909731
vals_trt_df = vals_trt_df.unstack(level = 'Drug')                             # Pivots on drug
vals_trt_df.columns = vals_trt_df.columns.get_level_values(1)                 # Column names from level 1 values
vals_trt_df                                                                   # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Drug Capomulin Infubinol Ketapril Placebo
Timepoint
0 45.000000 45.000000 45.000000 45.000000
5 44.266086 47.062001 47.389175 47.125589
10 43.084291 49.403909 49.582269 49.423329
15 42.064317 51.296397 52.399974 51.359742
20 40.716325 53.197691 54.920935 54.364417
25 39.939528 55.715252 57.678982 57.482574
30 38.769339 58.299397 60.994507 59.809063
35 37.816839 60.742461 63.371686 62.420615
40 36.958001 63.162824 66.068580 65.052675
45 36.236114 65.755562 70.662958 68.084082
errs_trt_df = errs_trt_df.unstack(level = 'Drug')                            # Pivots on drug
errs_trt_df.columns = errs_trt_df.columns.get_level_values(1)                # Column names from level 1 values
errs_trt_df                                                                  # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Drug Capomulin Infubinol Ketapril Placebo
Timepoint
0 0.000000 0.000000 0.000000 0.000000
5 0.448593 0.235102 0.264819 0.218091
10 0.702684 0.282346 0.357421 0.402064
15 0.838617 0.357705 0.580268 0.614461
20 0.909731 0.476210 0.726484 0.839609
25 0.881642 0.550315 0.755413 1.034872
30 0.934460 0.631061 0.934121 1.218231
35 1.052241 0.984155 1.127867 1.287481
40 1.223608 1.055220 1.158449 1.370634
45 1.223977 1.144427 1.453186 1.351726
sns.set()                                                                    # Switches to seaborn default display
plt.figure(figsize = (12,8))                                                 # Sets plot options
plt.title('Tumor Response to Treatment', fontdict = {'fontsize': 18})
plt.xlabel('Time (Days)')
plt.ylabel('Tumor Volume (mm3)')
plt.xticks(np.arange(0, vals_trt_df.index.max()+3 , 5))
plt.xlim(0, vals_trt_df.index.max())
plt.ylim(20, 80, 10)
(20, 80)
markers = ['o', '^', 's', 'd']                                               # Specifies marker shape
colors = ['r', 'b', 'g', 'k']                                                # Specifies line colors
x_axis = vals_trt_df.index                                                   # sets x axis to df index
counter = 0                                                                  # Counter for iterable items in loop
for item in vals_trt_df.columns:                                             # Creates plots for each drug
    plt.errorbar(x_axis, 
                 vals_trt_df[item],  
                 errs_trt_df[item],
                 linestyle = '--', 
                 marker =  markers[counter], 
                 color =  colors[counter],
                 capthick = 1, 
                 capsize = 3)
    counter += 1
lg = plt.legend(numpoints = 2,                                               # Sets legend options
                frameon = True, 
                markerscale = 1.5, 
                edgecolor = 'black', 
                fontsize = '16', 
                framealpha = 1)
plt.show()                                                                   # Displays plot 

png

Metastatic Response to Treatment

# Create two-column groupby dataframe for means
vals_mrt_df = pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).mean()['Metastatic Sites'])
# Create two-column groupby dataframe for standard error of means
errs_mrt_df = pd.DataFrame(targeted_only_df.groupby(['Drug', 'Timepoint']).sem()['Metastatic Sites'])
vals_mrt_df.head()                                                            # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Metastatic Sites
Drug Timepoint
Capomulin 0 0.000000
5 0.160000
10 0.320000
15 0.375000
20 0.652174
errs_mrt_df.head()                                                            # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Metastatic Sites
Drug Timepoint
Capomulin 0 0.000000
5 0.074833
10 0.125433
15 0.132048
20 0.161621
vals_mrt_df = vals_mrt_df.unstack(level = 'Drug')                             # Pivots on drug
vals_mrt_df.columns = vals_mrt_df.columns.get_level_values(1)                 # Column names from level 1 values
vals_mrt_df                                                                   # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Drug Capomulin Infubinol Ketapril Placebo
Timepoint
0 0.000000 0.000000 0.000000 0.000000
5 0.160000 0.280000 0.304348 0.375000
10 0.320000 0.666667 0.590909 0.833333
15 0.375000 0.904762 0.842105 1.250000
20 0.652174 1.050000 1.210526 1.526316
25 0.818182 1.277778 1.631579 1.941176
30 1.090909 1.588235 2.055556 2.266667
35 1.181818 1.666667 2.294118 2.642857
40 1.380952 2.100000 2.733333 3.166667
45 1.476190 2.111111 3.363636 3.272727
errs_mrt_df = errs_mrt_df.unstack(level = 'Drug')                             # Pivots on drug
errs_mrt_df.columns = errs_mrt_df.columns.get_level_values(1)                 # Column names from level 1 values
errs_mrt_df                                                                   # Displays dataframe
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Drug Capomulin Infubinol Ketapril Placebo
Timepoint
0 0.000000 0.000000 0.000000 0.000000
5 0.074833 0.091652 0.098100 0.100947
10 0.125433 0.159364 0.142018 0.115261
15 0.132048 0.194015 0.191381 0.190221
20 0.161621 0.234801 0.236680 0.234064
25 0.181818 0.265753 0.288275 0.263888
30 0.172944 0.227823 0.347467 0.300264
35 0.169496 0.224733 0.361418 0.341412
40 0.175610 0.314466 0.315725 0.297294
45 0.202591 0.309320 0.278722 0.304240
plt.figure(figsize = (12,8))                                                 # Sets plot options
plt.title('Metatastic Spread During Treatment', fontdict = {'fontsize': 18})
plt.xlabel('Treatment Duration (Days)')
plt.ylabel('Met. Sites')
plt.xticks(np.arange(0, vals_mrt_df.index.max()+3 , 5))
plt.xlim(0, vals_mrt_df.index.max())
plt.ylim(0, 4.0, 0.5)
(0, 4.0)
x_axis = vals_mrt_df.index                                                   # sets x axis to df index
counter = 0                                                                  # Counter for iterable items in loop
for item in vals_mrt_df.columns:                                             # Creates plots for each drug
    plt.errorbar(x_axis, 
                 vals_mrt_df[item],  
                 errs_mrt_df[item],
                 linestyle = '--', 
                 marker =  markers[counter],                                 # Defined in previous section
                 color =  colors[counter],
                 capthick = 1, 
                 capsize = 3)
    counter += 1
lg = plt.legend(numpoints = 2,                                               # Sets legend options
                frameon = True, 
                markerscale = 1.5, 
                edgecolor = 'black', 
                fontsize = '16', 
                framealpha = 1)
plt.show()                                                                   # Displays plot 

png

Survival Rates

# Counts mice through study and displays dataframe
survivors_df = targeted_only_df.groupby(['Drug', 'Timepoint']).count()['Mouse ID']
survivors_df = survivors_df.unstack(level = 'Drug')
survivors_df
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Drug Capomulin Infubinol Ketapril Placebo
Timepoint
0 25 25 25 25
5 25 25 23 24
10 25 21 22 24
15 24 21 19 20
20 23 20 19 19
25 22 18 19 17
30 22 17 18 15
35 22 12 17 14
40 21 10 15 12
45 21 9 11 11
plt.figure(figsize = (12,8))                                                  # Sets plot options
plt.title('Survival During Treatment', fontdict = {'fontsize': 18})
plt.xlabel('Time (Days')
plt.ylabel('Survival Rate (%)')
plt.xlim(0, survivors_df.index.max())
plt.ylim(30, 100, 10)
(30, 100)
x_axis = survivors_df.index                                                  # Sets x axis to df index
counter = 0                                                                  # Counter for iterable items in loop
for survivor in survivors_df:                                                # Creates plots for each drug
    y_values = survivors_df[survivor]/survivors_df.loc[0,survivor] * 100     # Calculates survival rate per timepoint
    plt.plot(x_axis, 
             y_values, 
             linestyle = '--',  
             marker = markers[counter],                                      # Defined in previous section
             color = colors[counter]
             )
    counter += 1
lg = plt.legend(numpoints = 2,                                               # Sets legend options
                frameon = True, 
                markerscale = 1.5, 
                edgecolor = 'black', 
                fontsize = '16', 
                framealpha = 1)
plt.show()                                                                   # Displays plot 

png

Summary Bar Graph

# Calculates tumor volume change for each targeted drug
summary = (vals_trt_df.loc[45, :] - vals_trt_df.loc[0, :])/vals_trt_df.loc[0, :] * 100
summary                                                                      # Displays series
Drug
Capomulin   -19.475303
Infubinol    46.123472
Ketapril     57.028795
Placebo      51.297960
dtype: float64
plt.title('Tumor Change Over 45 Day Treatment', fontdict = {'fontsize': 16}) # Sets plot options
plt.ylabel('% Tumor Volume Change')
plt.axhline(y=0, color = 'black')                                            # Adds horizontal line @ 0
xlabels = summary.index
plt.xticks(np.arange(len(xlabels)), xlabels)                                 # Creates labels from drug names
([<matplotlib.axis.XTick at 0xc383f28>,
  <matplotlib.axis.XTick at 0xc383b00>,
  <matplotlib.axis.XTick at 0xc371358>,
  <matplotlib.axis.XTick at 0xc3a0c50>],
 <a list of 4 Text xticklabel objects>)
plt.bar(np.arange(4),                                                        # Creates plots for each drug
        summary, 
        # Colors each bar based on positive or negative values
        color = ['red' if summary[value] > 0 else 'green' for value in np.arange(len(xlabels))]
       )
<Container object of 4 artists>
counter = 0                                                                  # Counter for iterable items in loop
for value in summary:                                                        # Creates values in plot
    if value < 0:                                                            # Positions values in plot
        y_point = -6.5
    else:
        y_point = 5
    plt.text(counter, y_point, str(int(value)) + '%', ha = 'center', color = 'white')
    counter += 1
plt.show()                                                                   # Displays plot 

png

plot-challenge's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.