Comments (10)
Hi, thanks for trying out the package.
Can I just confirm that "AML_ALL_merged_peaks.txt" is the file that you're using for CountPeaks for all of your data-sets? Theoretically all the peak identifiers should be in the count matrices as well (i.e. in the code you pointed to I would expect all row names of this.data to be in the peak file used for counting). That said, I should add a check, so thanks for pointing it out.
Regarding your alternative workflow with Seurat merge, that should work fine.
from sierra.
yes, "AML_ALL_merged_peaks.txt" is my whole datasets @reprobate .
Thanks again, and I would move on using your pipline.
from sierra.
Hello,
I saw a graph in your paper was meaningful, can you show me your code ?
from sierra.
Actually I was planning on incorporating that functionality into the R package anyway, will let you know when it is ready, but hopefully can make that update this week coming.
from sierra.
Hi @pangxueyu233,
I've added the functionality for performing the 3'UTR length analysis and generating the above visualisations. The details are in updated vignette under section 5. Let me know if you're able to run the new functions.
from sierra.
Thanks for your maintaining @reprobate ! I have tried your new function as follows:
Idents(peaks.seurat) <- peaks.seurat$new_anno3
res.table = DetectUTRLengthShift(peaks.object = peaks.seurat,
gtf_gr = gtf_gr,
gtf_TxDb = gtf_TxDb,
population.1 = "Neutrophil like",
population.2 = NULL, ncores = 25)
But I got an Error
> sel_clu <- unique(as.character(peaks.seurat$new_anno3))[i]
> res.table = DetectUTRLengthShift(peaks.object = peaks.seurat,
+ gtf_gr = gtf_gr,
+ gtf_TxDb = gtf_TxDb,
+ population.1 = sel_clu,
+ population.2 = NULL, ncores = 25)
[1] "1204 expressed peaks in feature types UTR3"
[1] "1152 peaks after filtering out A-rich annotations"
[1] "111 genes detected with multiple peak sites expressed"
[1] "243 Individual peak sites to test"
converting counts to integer mode
[1] "Running DEXSeq test..."
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
[1] "Detecting shifts in 3'UTR length usage"
Error in data.frame(SiteLocation = site.diff, NumSites = num.sites, row.names = diff.site, :
arguments imply differing number of rows: 0, 1
In addition: Warning message:
In vst(exp(alleffects), object) :
Dispersion function not parametric, applying log2(x+ 1) instead of vst...
It seems like my data cannot be normalised by log2(X+1), and I don't know how to avoid this error.
But when I changed the cluster "Neutrophil like"
into other clusters, it would be okay (but some can't, neither). But I do want know what the APA events in "Neutrophil like cluster"
, so whether there is a way to achive that?
Thanks.
from sierra.
There are a couple of things going on here. The log2(x+1) message isn't an error - this comes from DEXSeq trying to fit its dispersion function. Is your "Neutrophil like" population a minor cluster? Having a small number of cells will result in a lower 'sequencing depth' of the pseudo-bulk profiles provided to DEXSeq and can cause this issue to occur. One thing you could try would be to decrease the expression threshold for a peak - e.g. try setting exp.thresh to 0.05 and see what happens.
The actual error seems to be in the post-processing step, but I might need a bit more info to resolve it. Can you tell me if you run the below function what your output is?
res.table = DUTest(peaks.object = peaks.seurat,
population.1 = sel_clu,
population.2 = NULL,
feature.type = c("UTR3"),
filter.pA.stretch = TRUE)
from sierra.
I have tried your code @reprobate , and the results were as followings:
> res.table = DUTest(peaks.object = peaks.seurat,
+ population.1 = "Neutrophil like",
+ population.2 = NULL,
+ feature.type = c("UTR3"),
+ filter.pA.stretch = TRUE)
[1] "1204 expressed peaks in feature types UTR3"
[1] "1152 peaks after filtering out A-rich annotations"
[1] "111 genes detected with multiple peak sites expressed"
[1] "243 Individual peak sites to test"
converting counts to integer mode
[1] "Running DEXSeq test..."
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
Warning message:
In vst(exp(alleffects), object) :
Dispersion function not parametric, applying log2(x+ 1) instead of vst...
and my Neutrophil like
is a second largest population as you can see
> table(peaks.seurat$new_anno3)
HSPC Macrophages II Macrophages I MEP Erythrocytes
1032 943 2385 322 368
GMP like Erythroblast Neutrophil GMP Mono pro
11495 1772 5619 964 1042
Neutrophil like
7405
from sierra.
Thanks @pangxueyu233.
The error that needs to be resolved is:
Error in data.frame(SiteLocation = site.diff, NumSites = num.sites, row.names = diff.site, :
arguments imply differing number of rows: 0, 1
Which is occurring in the post-processing steps. I haven't been able to generate that error myself though, which makes debugging a bit tricky. Are you able to show me what is in res.table?
In the meantime, I've added a check to the code that should allow you to run the function without that error occurring. However, from your output, it seems that there are a small number of peaks being detected as expressed so I wouldn't expect there to be many detected examples of APA.
As I mentioned above, you could try reducing the stringency of the expression threshold to potentially increase the number of peaks considered and see if that makes a difference, for example:
res.table = DUTest(peaks.object = peaks.seurat,
population.1 = sel_clu,
population.2 = NULL,
exp.thresh = 0.05,
feature.type = c("UTR3"),
filter.pA.stretch = TRUE)
or
res.table = DetectUTRLengthShift(
peaks.object = peaks.seurat,
gtf_gr = gtf_gr,
gtf_TxDb = gtf_TxDb,
population.1 = sel_clu,
population.2 = NULL,
exp.thresh = 0.05,
feature.type = c("UTR3"),
filter.pA.stretch = TRUE)
from sierra.
Thanks a lot! I got right results @reprobate
from sierra.
Related Issues (20)
- umap coordinates for NewPeakSeurat HOT 3
- 3'end database HOT 2
- Using data with batch effects in Sierra HOT 2
- Error in peak calling HOT 2
- peak discrepancy HOT 1
- Error when running CountPeaks HOT 8
- CoveragePlot error with 'zoom_3UTR=TRUE ' HOT 1
- Which alignment method and indexing options are suitable to use with Sierra? HOT 3
- Does the Sierra package have a detailed protocol, I would like to find. HOT 1
- Chromosome name in FindPeaks // Help with Output HOT 3
- Generate GitHub Releases HOT 2
- Cellranger mkref function parameters for Sierra HOT 1
- FindPeaks Error--'x' values larger than vector length 'sum(width)' HOT 2
- DUTest function Error HOT 1
- Sierra dataframe has 0 length HOT 1
- [E::hts_open_format] Failed to open file HOT 1
- is it possible to generate a plot that shows global 3'UTR length change?
- issues with generating splice junction file HOT 7
- MergePeakCoordinates takes long time๏ผ HOT 9
- Paired-end & PlotRelativeExpression functions. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sierra.