GithubHelp home page GithubHelp logo

persist's People

Contributors

iancovert avatar rhngla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

persist's Issues

Speed?

Hello,

I'm running PERSIST on my scRNA-seq dataset, however I have >100K cells and I noticed training taking quite a while. I was wondering what I can do to speedup training? For reference I have 48gb GPU memory.

Best,
Chang

Binarization threshold

Hi!

Really cool package that's easy to use. I've already compared it to some prior selections using geneBasis and think I'd like to switch over. One of the things I'm I'd like to make sure I understand is how the binarization thresholds are set. In the "Expression quantization" section, you recommend using a threshold matching approach, where a threshold value of zero is used in the scRNA-seq measurements. You then suggest finding the corresponding quantile in the scRNA-seq data and identifying the matching threshold in the FISH data.

I'm starting from scRNA-seq data available and plan to select a 100-gene panel for smFISH data. I was wondering if the following an appropriate procedure:

  1. Preprocess the scRNA-seq data, including quality control and filtering steps.
  2. Normalize and transform the scRNA-seq data using an appropriate method such as log CPM or SCTransform.
  3. Set the threshold value to 0 for the transformed scRNA-seq data.

All the best,
Petar

Is it REALLY necessary to blow up the sparse data to dense?

I assume that the title might be enough.

In your example 01_persist_supervised.ipynb you have this statement:

# Initialize the dataset for PERSIST
# Note: Here, data_train.layers['bin'] is a sparse array
# data_train.layers['bin'].A converts it to a dense array
train_dataset = ExpressionDataset(adata_train.layers['bin'].A, adata_train.obs['cell_types_25_codes'])
val_dataset = ExpressionDataset(adata_val.layers['bin'].A, adata_val.obs['cell_types_25_codes'])

And of cause if you do not do that it does not work.

Can't this tool be supporting sparse data instead?
This does not feel state of the art - sorry.

Runtime Error in 01_persist_supervised.ipynb

Hi all,

I have come across this very promising gene marker selection and I want to try it on some example data that I have.

In the block where PERSIST is run, the following line:

candidates, model = selector.eliminate(target=500, max_nepochs=250)

returns a runtime Error. More details attached in the file.

error.txt

Has anyone come across this bug? Could it be that it is trying to force us to use GPU?

Thanks in advance for looking into this.

LightGBM for Cell Type Classification

Hi, @iancovert

In your paper, you mentioned using LightGBM models for cell type classification with a learning rate of 0.05 and 10,000 boosters (with early stopping). However, I couldn't locate the complete code implementation for LightGBM in the materials you provided. Since LightGBM has several hyperparameters, I'm interested in details about additional parameters beyond the learning rate and booster count, specifically:

  1. Max Depth of Trees (max_depth)
  2. Number of Classes (num_class)
  3. Maximum Number of Leaves (num_leaves)
  4. Feature Fraction (feature_fraction)
  5. Early Stopping Rounds (early_stopping_round)

If possible, could you please share the code or provide guidance on how these parameters were configured in your experiments?

Thank you for your time and consideration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.