wayscience / cfret_data Goto Github PK

Image-based profiling and machine learning to predict failing vs. non-failing cardiac fibroblasts

License: Creative Commons Zero v1.0 Universal

Shell 0.01% Python 0.82% Jupyter Notebook 98.76% R 0.41%

cell-painting cellprofiler image-processing image-based-analysis machine-learning pycytominer

cfret_data's Introduction

Cell Painting predicts cardiac fibrosis

In this repository, we perform image analysis, image-based profiling, and machine learning to predict failing and non-failing cardiac fibroblasts.

Goals

The goals of this project are to:

Comprehensively define cell-state differences between failing and non-failing cardiac fibroblast (CF) populations.
Accurately predict CF phenotype that can generalize and be applied to a large-drug screen to find hits that make the failing cells look healthy.

Cell Painting

We performed a modified Cell Painting assay on cardiac fibroblasts from non-failing and failing human hearts.

In this modified Cell Painting, there are five channels:

d0 (Nuclei)
d1 (Endoplasmic Reticulum)
d2 (Golgi/Plasma Membrane)
d3 (Mitochondria)
d4 (F-actin)

Data

For training a logistic regression classifier, we extracted single-cell morphology profiles from CFs of different patients with the same heart failure type and patients with non-failing hearts. We label wells from each patient as either "Failing" (failed heart) or "Healthy" (non-failing heart). There was no treatment or perturbation applied to any of the wells in this plate.

localhost231120090001

We utilized a secondary plate to assess the generalizability of the model. This plate contains two different hearts, one from a non-failing patient and the other from a failing patient. Wells are labelled as either "Failing" (failing heart) or "Healthy" (non-failing heart). There are three different treatments:

DMSO (control)
drug_x
TGRFi (TGFB inhibitor)

We applied the model to this dataset to evaluate its accuracy in predicting control cells and to observe how its predictions change in response to different treatments.

localhost230405150001

Additionally, we include the pilot plates below in this repository that were not prepared using an optimized protocol, intended for further analysis that is not included in the manuscript.

localhost220512140003_KK22-05-198
localhost220513100001_KK22-05-198_FactinAdjusted

See our platemaps for more details.

Repository Structure

Module	Purpose	Description
0.download_data	Download CFReT pilot data	Download pilot images for the CFReT project
1.preprocessing_data	Perform Illumination Correction (IC)	We use CellProfiler to perform IC on images per channel for all plates
2.cellprofiler_processing	Apply feature extraction pipeline	We use CellProfiler to extract hundreds of morphology features per imaging channel
3.process_cfret_features	Get morphology features analysis ready	Apply cytotable and pycytominer to perform single-cell merging, annotation, normalization, and feature selection
4.analyze_data	Analyze the single cell profiles to achieve goals listed above	Several independent analyses to describe data and test hypotheses
5.machine_learning	Generate binary logistic regression model	Train model to predict healthy or failing cells and evaluate performance

Create CellProfiler conda environment

For modules 1 and 2, we use one CellProfiler environment for the repository, which includes all packages needed including installing CellProfiler v4.2.4 among other packages.

To create the environment, run the below code block:

# Run this command in terminal to create the conda environment
conda env create -f cfret_cp_env.yml

Make sure that the conda environment is activated before running CellProfiler related notebooks or scripts:

conda activate cfret_data

Python and R analysis

There are two different environments used for python (preprocessing steps) and

python_analysis_env: This environment is for use in python specific notebooks, like the preprocessing and analyze data modules in modules 3 and 4.
R_analysis_env: This environment is for use in R specific notebooks, specifically for visualizing data.

NOTE: For module 5 where we train a machine learning model, there is a specific environment for that module that can be installed which is described in the README.

You can create the environments using the code block below:

# create environment for analysis
mamba env create -f python_analysis_env.yml
mamba env create -f R_analysis_env.yml

Make sure that the conda environment is activated before running related notebooks or scripts:

conda activate python_analysis_cfret
# OR
conda activate R_analysis_cfret

cfret_data's People

Contributors

Stargazers

Forkers

jenna-tomkinson gwaybio axiomcura hasihays

cfret_data's Issues

Add cell count estimation in feature extraction step

In #12 I create a cell count dataframe. We have to do this often, for many different visualizations.

Consider adding a cell count module as a standard step in preprocessing so that we only have to do this once.

Create platemap visualization for all plates

In #12 I only added a platemap for plate 2 (Factinadjusted). We should add a platemap for plate 1 (and all subsequent plates as well!)

Drop na columns in feature selection step

I am fitting UMAPs but cannot because there are missing values in the data frame. Please add drop_na_columns with an na_cutoff=0 in 3.process-cfret-features

Update analysis conda environments

Problem

Issue brought up by @roshankern in PR #24.

The two environments in the 4.analyze_data module look to have packages that aren't used/cause issues, and the naming isn't clear which is the R conda env and the Python conda env.

Solution

Will need to refactor this in the future so add specified versions + confirm which packages are needed, as well as making the names clearer.

Document data quality issues

We have noticed some relatively minor (and common) issues with data quality in this dataset.

Some issues include blurriness, debris, and high levels of channel bleed-through.

@jenna-tomkinson - please make sure to make detailed notes of these issues. These notes likely belong in the preprocessing module alongside IC (and elsewhere TBD)!

Mint DOI

We need to mint a Zenodo DOI for this repo, as we are nearing submission.

We need to perform the following prior to DOI minting:

Review issues and determine if any are critical to resolve
Review READMEs to confirm a) clarity and b) that our results align with what we present in the paper (that they are all up to date and that they don't contradict)
Perform any software gardening to make sure repo is maximally reproducible

Thanks!

Open Data Access

@gwaybio For the CFReT data, do we know if there is anything binding the data to prevent it from being public.

If not, there are two options for where to put this data. Currently, before any removal of images, it is a little over 2GB with 990 images. We can:

Add it to DVC:

I do not this as viable since I do not see in the documentation on their website that there is DropBox support (see https://dvc.org/doc/command-reference/remote)
There is a GitHub issue regarding the want for DropBox support and it shows that the issue was resolved but I am not confident that the issue was fully corrected since it states in the issue that there isn't much demand for it.

Add to Figshare:

This seems like the better option but I am not sure about the process so I have a few questions:
1. Is there a Way Lab figshare login or do I create my own?
2. Is it like GitHub where there is a Way Science organization and then the datasets within it?

Based on the answers to my questions, we can then proceed in one direction.

Plate 4 `plate` metadata not extracted due to regular expression in CP analysis pipeline

Problem

The Metadata_Plate column in for new Plate 4 data has None for each row due to regular expression hard-coded to only find plate name with specific name.

Solution

Update regular expression to be more flexible.

Utilize a themes.r file for some figure making

For some of these theme items, you could set up an R theme file and import it to reduce code lines.
An example of one of those files: https://github.com/WayScience/phenotypic_profiling/blob/main/7.figures/themes.r

and the implementation: https://github.com/WayScience/phenotypic_profiling/blob/main/7.figures/f1_score_visualization.ipynb

Originally posted by @MikeLippincott in #42 (comment)

Refactor UMAP and LM analysis to be more automated

Problem

In the two analysis modules, there is a lot of manual work needed to change names and variables every time you need to perform an analysis with a different subset of the data.

Solution

At some point, it would be best to take time to refactor and make the work more automated by using dictionaries or other methodologies.

Current PyBaSiC code converts images to 8-bit instead of keeping to same bit-size as raw images

I am unsure the reason why we decided to convert the images to 8-bit but I do not think this is the best practice. As described in issue #18, we will continue to use this method but I think that this isn't the best practice based on my knowledge of bit-depth and data loss.

Refactor to use CellProfiler for IC instead of PyBaSiC

With the update of PyBaSiC to BaSiCPy, we have discovered some issues with using the package for static images that were not there when running the previous version (e.g. PyBaSiC).

Since it would take time to test more with BaSiCPy and to attempt to run an entirely different IC method with CellProfiler, we will continue to use the current iteration of PyBaSiC (in which we clone into the repo with a specific commit hash) for the next plates.

The goal is to either get BaSiCPy working for static images or refactor to CellProfiler to reflect the standard pipeline.

wayscience / cfret_data Goto Github PK

cfret_data's Introduction

Cell Painting predicts cardiac fibrosis

Goals

Cell Painting

Data

Repository Structure

Create CellProfiler conda environment

Python and R analysis

cfret_data's People

Contributors

Stargazers

Forkers

cfret_data's Issues

Problem

Solution

Problem

Solution

Problem

Solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs