iancovert / persist Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 3.0 1.06 MB

License: MIT License

Python 100.00%

persist's People

Contributors

Stargazers

Watchers

Forkers

pvtodorov kniekamp19

persist's Issues

Speed?

Hello,

I'm running PERSIST on my scRNA-seq dataset, however I have >100K cells and I noticed training taking quite a while. I was wondering what I can do to speedup training? For reference I have 48gb GPU memory.

Best,
Chang

Binarization threshold

Hi!

Really cool package that's easy to use. I've already compared it to some prior selections using geneBasis and think I'd like to switch over. One of the things I'm I'd like to make sure I understand is how the binarization thresholds are set. In the "Expression quantization" section, you recommend using a threshold matching approach, where a threshold value of zero is used in the scRNA-seq measurements. You then suggest finding the corresponding quantile in the scRNA-seq data and identifying the matching threshold in the FISH data.

I'm starting from scRNA-seq data available and plan to select a 100-gene panel for smFISH data. I was wondering if the following an appropriate procedure:

Preprocess the scRNA-seq data, including quality control and filtering steps.
Normalize and transform the scRNA-seq data using an appropriate method such as log CPM or SCTransform.
Set the threshold value to 0 for the transformed scRNA-seq data.

All the best,
Petar

Is it REALLY necessary to blow up the sparse data to dense?

I assume that the title might be enough.

In your example 01_persist_supervised.ipynb you have this statement:

# Initialize the dataset for PERSIST
# Note: Here, data_train.layers['bin'] is a sparse array
# data_train.layers['bin'].A converts it to a dense array
train_dataset = ExpressionDataset(adata_train.layers['bin'].A, adata_train.obs['cell_types_25_codes'])
val_dataset = ExpressionDataset(adata_val.layers['bin'].A, adata_val.obs['cell_types_25_codes'])

And of cause if you do not do that it does not work.

Can't this tool be supporting sparse data instead?
This does not feel state of the art - sorry.

Runtime Error in 01_persist_supervised.ipynb

Hi all,

I have come across this very promising gene marker selection and I want to try it on some example data that I have.

In the block where PERSIST is run, the following line:

candidates, model = selector.eliminate(target=500, max_nepochs=250)

returns a runtime Error. More details attached in the file.

error.txt

Has anyone come across this bug? Could it be that it is trying to force us to use GPU?

Thanks in advance for looking into this.

LightGBM for Cell Type Classification

Hi, @iancovert

In your paper, you mentioned using LightGBM models for cell type classification with a learning rate of 0.05 and 10,000 boosters (with early stopping). However, I couldn't locate the complete code implementation for LightGBM in the materials you provided. Since LightGBM has several hyperparameters, I'm interested in details about additional parameters beyond the learning rate and booster count, specifically:

Max Depth of Trees (max_depth)
Number of Classes (num_class)
Maximum Number of Leaves (num_leaves)
Feature Fraction (feature_fraction)
Early Stopping Rounds (early_stopping_round)

If possible, could you please share the code or provide guidance on how these parameters were configured in your experiments?

Thank you for your time and consideration.

iancovert / persist Goto Github PK

persist's People

Contributors

Stargazers

Watchers

Forkers

persist's Issues

Speed?

Binarization threshold

Is it REALLY necessary to blow up the sparse data to dense?

Runtime Error in 01_persist_supervised.ipynb

LightGBM for Cell Type Classification

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs