GithubHelp home page GithubHelp logo

wayscience / cell-health-data Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 5.0 23.11 MB

Data processing for the Cell Painting data from the Cell Health experiments

Jupyter Notebook 98.31% Python 1.65% Shell 0.04%

cell-health-data's People

Contributors

axiomcura avatar gwaybio avatar jenna-tomkinson avatar mattsoncam avatar roshankern avatar

Stargazers

 avatar  avatar  avatar

cell-health-data's Issues

Normalization Issues

While attempting to normalize cells with the DeepProfiler tools from https://github.com/cytomining/pycytominer, @jenna-tomkinson and I ran out of memory. Because of the size of the Cell Health dataset, the single-cell dataframe PyCytominer attempts to compile and normalize is not able to fit into the 64 GB of memory on our machine. Thus, we are pursuing a couple different alterations to PyCytominer to make it capable of normalizing the dataset.

Our information regarding normalization population is derived from Data-analysis strategies for image-based cell profiling by Caicedo et al.

Normalization by all samples:
Caicedo says the following about choosing all samples for normalization population:

Ideally, features are normalized across an entire screen in which batch effects are absent

In the perfect world, we would normalize across all single cells. Of course, this is not viable because we do not have enough memory.

Normalization by plates:
Caicedo says the following about choosing plates for normalization population:

normalization within plates is generally performed to correct for batch effects (described in 'Batch-effect correction')... all samples on a plate can be used as the normalizing population when negative controls are unavailable, too few, or unsuitable for some reason, and when samples on each plate are expected to not be enriched in dramatic phenotypes.

We are not attempting to correct for batch effects in Celll Health data. The proposition that our data is "not enriched in dramatic phenotypes" is somewhat subjective and thus is a questionable basis for choosing this normalization method.

Normalization by controls:
Caicedo says the following about choosing controls for normalization population:

When choosing the normalizing population, we suggest the use of control samples (assuming that they are present in sufficient quantity), because the presence of dramatic phenotypes may confound results. This procedure is good practice regardless of the normalization being performed within plates or across the screen.

Given the abundance of controls in Cell Health data, this seems like the most viable method of normalization.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.