GithubHelp home page GithubHelp logo

wayscience / cfret_data Goto Github PK

View Code? Open in Web Editor NEW
4.0 0.0 4.0 714.1 MB

Image-based profiling and machine learning to predict failing vs. non-failing cardiac fibroblasts

License: Creative Commons Zero v1.0 Universal

Shell 0.01% Python 0.82% Jupyter Notebook 98.76% R 0.41%
cell-painting cellprofiler image-processing image-based-analysis machine-learning pycytominer

cfret_data's Introduction

DOI

Cell Painting predicts cardiac fibrosis

In this repository, we perform image analysis, image-based profiling, and machine learning to predict failing and non-failing cardiac fibroblasts.

Goals

The goals of this project are to:

  1. Comprehensively define cell-state differences between failing and non-failing cardiac fibroblast (CF) populations.
  2. Accurately predict CF phenotype that can generalize and be applied to a large-drug screen to find hits that make the failing cells look healthy.

Cell Painting

We performed a modified Cell Painting assay on cardiac fibroblasts from non-failing and failing human hearts.

In this modified Cell Painting, there are five channels:

  • d0 (Nuclei)
  • d1 (Endoplasmic Reticulum)
  • d2 (Golgi/Plasma Membrane)
  • d3 (Mitochondria)
  • d4 (F-actin)

Composite_Figure.png

Data

For training a logistic regression classifier, we extracted single-cell morphology profiles from CFs of different patients with the same heart failure type and patients with non-failing hearts. We label wells from each patient as either "Failing" (failed heart) or "Healthy" (non-failing heart). There was no treatment or perturbation applied to any of the wells in this plate.

  • localhost231120090001

localhost231120090001_platemap_figure.png


We utilized a secondary plate to assess the generalizability of the model. This plate contains two different hearts, one from a non-failing patient and the other from a failing patient. Wells are labelled as either "Failing" (failing heart) or "Healthy" (non-failing heart). There are three different treatments:

  • DMSO (control)
  • drug_x
  • TGRFi (TGFB inhibitor)

We applied the model to this dataset to evaluate its accuracy in predicting control cells and to observe how its predictions change in response to different treatments.

  • localhost230405150001

localhost230405150001_platemap_figure.png

Additionally, we include the pilot plates below in this repository that were not prepared using an optimized protocol, intended for further analysis that is not included in the manuscript.

  • localhost220512140003_KK22-05-198
  • localhost220513100001_KK22-05-198_FactinAdjusted

See our platemaps for more details.

Repository Structure

Module Purpose Description
0.download_data Download CFReT pilot data Download pilot images for the CFReT project
1.preprocessing_data Perform Illumination Correction (IC) We use CellProfiler to perform IC on images per channel for all plates
2.cellprofiler_processing Apply feature extraction pipeline We use CellProfiler to extract hundreds of morphology features per imaging channel
3.process_cfret_features Get morphology features analysis ready Apply cytotable and pycytominer to perform single-cell merging, annotation, normalization, and feature selection
4.analyze_data Analyze the single cell profiles to achieve goals listed above Several independent analyses to describe data and test hypotheses
5.machine_learning Generate binary logistic regression model Train model to predict healthy or failing cells and evaluate performance

Create CellProfiler conda environment

For modules 1 and 2, we use one CellProfiler environment for the repository, which includes all packages needed including installing CellProfiler v4.2.4 among other packages.

To create the environment, run the below code block:

# Run this command in terminal to create the conda environment
conda env create -f cfret_cp_env.yml

Make sure that the conda environment is activated before running CellProfiler related notebooks or scripts:

conda activate cfret_data

Python and R analysis

There are two different environments used for python (preprocessing steps) and

  • python_analysis_env: This environment is for use in python specific notebooks, like the preprocessing and analyze data modules in modules 3 and 4.
  • R_analysis_env: This environment is for use in R specific notebooks, specifically for visualizing data.

NOTE: For module 5 where we train a machine learning model, there is a specific environment for that module that can be installed which is described in the README.

You can create the environments using the code block below:

# create environment for analysis
mamba env create -f python_analysis_env.yml
mamba env create -f R_analysis_env.yml

Make sure that the conda environment is activated before running related notebooks or scripts:

conda activate python_analysis_cfret
# OR
conda activate R_analysis_cfret

cfret_data's People

Contributors

axiomcura avatar gwaybio avatar jenna-tomkinson avatar

Stargazers

 avatar  avatar  avatar

cfret_data's Issues

Update analysis conda environments

Problem

Issue brought up by @roshankern in PR #24.

The two environments in the 4.analyze_data module look to have packages that aren't used/cause issues, and the naming isn't clear which is the R conda env and the Python conda env.

Solution

Will need to refactor this in the future so add specified versions + confirm which packages are needed, as well as making the names clearer.

Document data quality issues

We have noticed some relatively minor (and common) issues with data quality in this dataset.

Some issues include blurriness, debris, and high levels of channel bleed-through.

@jenna-tomkinson - please make sure to make detailed notes of these issues. These notes likely belong in the preprocessing module alongside IC (and elsewhere TBD)!

Mint DOI

We need to mint a Zenodo DOI for this repo, as we are nearing submission.

We need to perform the following prior to DOI minting:

  • Review issues and determine if any are critical to resolve
  • Review READMEs to confirm a) clarity and b) that our results align with what we present in the paper (that they are all up to date and that they don't contradict)
  • Perform any software gardening to make sure repo is maximally reproducible

Thanks!

Open Data Access

@gwaybio For the CFReT data, do we know if there is anything binding the data to prevent it from being public.

If not, there are two options for where to put this data. Currently, before any removal of images, it is a little over 2GB with 990 images. We can:

  1. Add it to DVC:
  • I do not this as viable since I do not see in the documentation on their website that there is DropBox support (see https://dvc.org/doc/command-reference/remote)
  • There is a GitHub issue regarding the want for DropBox support and it shows that the issue was resolved but I am not confident that the issue was fully corrected since it states in the issue that there isn't much demand for it.
  1. Add to Figshare:
  • This seems like the better option but I am not sure about the process so I have a few questions:
    1. Is there a Way Lab figshare login or do I create my own?
    2. Is it like GitHub where there is a Way Science organization and then the datasets within it?

Based on the answers to my questions, we can then proceed in one direction.

Refactor UMAP and LM analysis to be more automated

Problem

In the two analysis modules, there is a lot of manual work needed to change names and variables every time you need to perform an analysis with a different subset of the data.

Solution

At some point, it would be best to take time to refactor and make the work more automated by using dictionaries or other methodologies.

Refactor to use CellProfiler for IC instead of PyBaSiC

With the update of PyBaSiC to BaSiCPy, we have discovered some issues with using the package for static images that were not there when running the previous version (e.g. PyBaSiC).

Since it would take time to test more with BaSiCPy and to attempt to run an entirely different IC method with CellProfiler, we will continue to use the current iteration of PyBaSiC (in which we clone into the repo with a specific commit hash) for the next plates.

The goal is to either get BaSiCPy working for static images or refactor to CellProfiler to reflect the standard pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.