Comments (5)
kept and read_count_all should have the same order as the input data.
from higashi.
OK, thanks!
from higashi.
Hi Ruochi,
Sorry for disturbing again. when we say "order of input data", does it mean the order of barcodes (from 0 to n, I record it in an array cellid
), or the order of barcodes first appear in input data (could be 0, 32, 5, 11...)?
I tried to filter out those cells that didn't pass QC: cellid[kept.astype(bool)]
and then regenerate the input file to make sure it only contains the barcodes past qc. But when loading the new input file, not all the cells are defined as pass qc, and the ratio of "good cells" mimics the unfiltered data.
from higashi.
Hey, What you did using cellid[kept.astype(bool)]
is correct. The behavior is also expected, because in the current code, the good_qc of cells are defined based on quantile, so things like this can happen.
The logic goes like this:
- calculate n_counts per chrom. If there are more than 50% of the cells that have n_counts > number of bins in this chromosome, we use n_counts > n_bin as the metric. If not, we use n_counts > np.quantile(n_counts, 0.5) as the metric. This metric is per chromosome.
- Good qc cells are cells that pass metric for all chromosomes.
So, if there are fewer than 50% of the cells that have n_counts > n_bins on one or some of the chromosomes, the metric is defined based on quantile, which means cells can be further filtered.
from higashi.
Thanks for the clarification!
from higashi.
Related Issues (20)
- ValueError: Found array with 1 feature(s) (shape=(250, 1)) while a minimum of 2 is required by TruncatedSVD HOT 16
- Some question about json file
- some question about json file HOT 1
- NameError: name 'neg_num' is not defined HOT 10
- FastHigashi wrapper.prep_dataset: 'int' object has no attribute 'shape' HOT 9
- Some problem about color HOT 5
- Problem solved
- Error running Ramani data HOT 2
- higashi.process_data() won't finish HOT 20
- higashi.Higashi_backend.Modules import error HOT 5
- error when running scTAD.py HOT 1
- Error running simulated data
- The main_cell.py is so slow HOT 5
- Problem running Higashi on Ramani et al. HOT 5
- What are the configure options mean?
- Stop with OSError when run "higashi_model.train_for_imputation_nbr_0()" HOT 3
- Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors" HOT 2
- ERROE when run process.py: no config file HOT 1
- Predicting structures from embedding vector HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from higashi.