Comments (1)
Hi,
This is a very interesting and complex question. I will try to give an answer/discussion.
Let me rephrase your questions in smaller ones.
How to distinguish actual cell types with very little open chromatin from low quality barcodes (they may not even be cellsI am not even calling them cells in this case).
How to identify these low-depth cell types when the first components seems to be library size
Technical low depth barcodes/cells can be caused by 2 things:
how many reads were sequenced from the library
how many insertions the transposase actually managed to do in a cell.
Biological low depth can be due to:
the very close chromatin state of the cell type
a cell type/nucleus that doesnโt resist the protocol or that can be harder to integrate for the transposase
About a):
All cells, independently of cell type, have some regions that should systematically be open (RNApol2, other ubiquitous genes). So we can expect a minimum number of insertion per cell.
This can be explored by looking at the QC and checking TSS enrichment at house keeping genes, for example.
About b):
Usually, if you are using peaks you will identify peaks from highly covered cells so you will lose the low-depth as noisy (having more than x percent of their reads outside of peaks).You can try to use an annotation based feature space to try to keep some of the biological signal.
You can also focus on the lowly covered cell and try to use a different feature space, like promoter regions or small windows to see if there are some regions enriched in the lowly covered cells that might be cell type specific. Once you have done that you can decide on a feature space containing the regions that are cell type specific and look at all the cells together.
So, to some extent you can salvage the low-depth cells from the technicaly lowly covered cells. However, you will still have the library size effect. It is a big technical artefact and it is not disappearing despite excluding PC1 and/or oding library size correction.
You can check the relationship between library size (or any other technical artifact) and the PC components using the function correlation_pc. This is very useful to identify artifacts in the data. However, we would not recommend to remove the first four PCs, as you will be removing a lot of the biological variation present in the data like that (as you can see that library size is mainly correlated with PC1; to check how much library size explains the other PCs you can use correlation_pc).
from episcanpy.
Related Issues (20)
- Problems when generating matrix (episcanpy.ct.bld_mtx_fly())
- In DNA methylation tutorial, the data for matrices construction and secondary processing are not consistant. HOT 1
- 'IndexError: list index out of range' when running epi.tl.geneactivity() HOT 3
- numpy>=1.21.2 requires python >= 3.7 but install docs show python 3.6
- Error with normalize_total() HOT 2
- How to generate count matrix in Episcanpy
- Potential edit required in building count matrix (methylation) HOT 1
- Plans to include other QC parameters for scATAC-seq data?
- Future plans
- episcanpy.tl.silhouette calls sklearn.metrics.silhouette_score with wrong arguments HOT 1
- Error in Generating Methylation Count Matrices HOT 2
- Questions about running time for epi.tl.geneactivity
- installation should say python 3.7 else SyntaxError: future feature annotations is not defined
- Type in import of `tss_enrichment_score` HOT 1
- Import Error when Importing on Python 3.9
- Do we have method which can find tf-gene or peak-gene relation? HOT 1
- Problems in importing the api after PyPI installation HOT 2
- Bug in silhouette score calculation
- Error running "epi.ct.save_sparse_mtx" HOT 1
- Error when loading 10X Cellranger output with read_ATAC_10x() HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from episcanpy.