vccri / sierra Goto Github PK
View Code? Open in Web Editor NEWDiscover differential transcript usage from polyA-captured single cell RNA-seq data
License: GNU General Public License v3.0
Discover differential transcript usage from polyA-captured single cell RNA-seq data
License: GNU General Public License v3.0
Hi,
I have data with cells from various individuals representing two different groups, and I'm interested in comparing the same cell types between the groups. I would like to use the individuals as replicates when running DEXSeq for DTU. Is there an easy way to modify the existing code so that the pseudo-bulk samples correspond to all the cells of interest in the available individuals?
Best,
Daniel
Hello, thank you very much for developing this package.
I have a few questions about how to explain the table obtained from DUTest.
Thanks!
Hi,
I am trying to use this on human scRNA-seq data generated using the 10x platform.
I got this error after running
genome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
gtf.file = reference.file,
output.file = "peak_annotations.txt",
genome = genome)
Error in asMethod(object) :
The character vector to convert to a GRanges object must contain
strings of the form "chr:start-end" or "chr:start-end:strand", with end
>= start - 1, or "chr:pos" or "chr:pos:strand". For example:
"chr1:2501-2900", "chr1:2501-2900:+", or "chr1:740". Note that ".." is
a valid alternate start/end separator. Strand can be "+", "-", "*", or
missing.
It turns out that some of the regions in the peak file have chr start position greater than the end. For example:
peaks.use.chr.update[250:260]
[1] "chr16:2853291-2856833:-" "chr16:2854597-2855173:-"
[3] "chr19:18857128-18867664:+" "chr19:18863469-18866739:+"
[5] "chr19:18867222-18868236:+" "chr19:18865704-18866607:+"
[7] "chr19:18867002-18867680:+" "chr19:18866058-18867445:+"
[9] "chr19:18857450-18860932:+" "chr19:18867147-18867135:+"
[11] "chr7:26864407-26864659:-"
Do you know why that might be happening?
Hi,
I'm trying to re-analyze some already published data. previously analyzed data with gene counts detects 17473 features and 56902 cells, my dataset is only able to detect 394 peaks and 170706 cells.
which is weird this number should be higher than the number of genes feature or at least equal, is there a point where something could have gone wrong I need to check? thanks.
seuratPeaks <- NewPeakSeurat(
peak.counts,
peak.annotations,
min.cells = 0,
min.peaks = 0 )
Same data but with function:
peaks.seurat <- PeakSeuratFromTransfer(peak.data = peak.counts,
genes.seurat = published.seurat_object,
annot.info = peak.annotations,
min.cells = 0, min.peaks = 0)
Gives instead:
394 peaks and 56902 cells
I don't understand why the two functions give 2 different outputs.
Thanks!
I've been running through a lot of datasets with Sierra pretty smoothly. With a tumor dataset, I'm getting the following error with AnnotatePeaksFromGTF:
[1] "Annotating 60112 peak coordinates."
Annotating 3' UTRs
Annotating 5' UTRs
Annotating introns
Annotating exons
Annotating CDS
Analysing genomic motifs surrounding peaks (this can take some time)
|==================================================================================================================== | 88%Error in if (start > stop) { : missing value where TRUE/FALSE needed
I'm using a custom GTF with transgene elements in it (EGFP, TdTomato, WPRE) but I've had no problems using CellRanger, Kallisto/Bustools, Seurat, Scanpy with it. Also, I got the same error with using the CellRanger mm10 (which is what I modified to add the transgenes). I also got some warnings related to the custom GTF:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon, exon, transcript. This information was
ignored.
2: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
4: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
5: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
6: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
7: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
8: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:- in 'x': M
- in 'y': MT, WPRE, YAPRELApA, BFP2, GL456210.1, GL456211.1, GL456212.1, GL456216.1, GL456219.1, GL456221.1, GL456233.1, GL456239.1, GL456350.1, GL456354.1, GL456372.1, GL456381.1, GL456385.1, JH584292.1, JH584293.1, JH584294.1, JH584295.1, JH584296.1, JH584297.1, JH584298.1, JH584299.1, JH584303.1, JH584304.1, mGFP, mTom
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
Update: tried another "normal" non tumor sample with the custom reference and got the same error at the same place. Might there be a quick fix? I'm re-aligning one my samples to the normal mm10 to see if that fixes it.
Hello everyone,
Thank you developing Sierra. I want to use it on some scRNA-Seq data but as I guess, the alignment step is crucial in this case. I believe STAR works fine in this case. Is it okay to use other algorithms such as hisat2?
Also, the indexing of genome using STAR requires lots of parameters including splice junction database parameters. Do they needed to use in indexing for best performance in using Sierra?
Hi there,
Thank you for developing this useful tool for single cell dataset analysis.
I tried to use this for my dataset which processed by Dropseq pipeline. It actually tag the cell barcode as XC rather than CB. I just replace the CB with XC in the source code and it works. I just wondering whether it is possible to set the cell barcode tag as one of the input parameter for convinced in future. Thank you in advance.
Y
Thanks again for all of the help.
Almost everything from the vignette now works on my dataset.
However, I can't use the function PeakSeuratFromTransfer, only NewPeakSeurat (which works fine and let's me proceed through the vignette).
Specifically, using PeakSeuratFromTransfer I get the error:
[1] "Creating Seurat object with 49254 peaks and 0 cells"
[1] "Preparing feature table for DEXSeq"
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in new.data[new.features, colnames(x = object), drop = FALSE] :
invalid or not-yet-implemented 'Matrix' subsetting
I used a standard Seurat V3 workflow (not using SCTransform though I tried that at first with no success) to make the Seurat Object from the 10X matrix and computed the tsne. Is the "0 cells" notation troubling?
NewPeakSeurat gives the following:
[1] "Creating Seurat object with 49254 peaks and 9128 cells"
[1] "Preparing feature table for DEXSeq"
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "No t-SNE coodinates included"
[1] "No UMAP coordinates included"
Dear developing team,
I get the following error when I run AnnotatePeaksFromGTF:
This issue comes from BSgenome::getSeq. I managed to track down the row that caused it. Ultimately, the error is given by a function in the IRanges package. You can reproduce it by running the function with the following peaks file.
Gene Chr Strand MaxPosition Fit.max.pos Fit.start Fit.end mu sigma k exon.intron exon.pos polyA_ID mt-Rnr1 chrM 1 636 634 115 1024 298.217592563921 173.813560963883 3632.99934014036 no-junctions NA mt-Rnr1:chrM:115-1024:1
Now, I am not sure why this error comes up. But, given the default parameters used in these functions, if I had to make a wild guess, it would be that the peak is too close to the start of the chromosome (closer than 250bp). I could be wrong of course, but, if that is the case, it would be good to adapt the code to these cases are consciously exclude them from the peaks file.
thanks,
amisios
Hi,
Sierra is really a great tool to analyze APA in scRNA-seq.
I try to run Sierra using Vignette example,
extdata_path <- system.file("extdata",package = "Sierra")
reference.file <- paste0(extdata_path,"/Vignette_cellranger_genes_subset.gtf")
junctions.file <- paste0(extdata_path,"/Vignette_example_TIP_sham_junctions.bed")
bamfile <- c(paste0(extdata_path,"/Vignette_example_TIP_sham.bam"),
paste0(extdata_path,"/Vignette_example_TIP_MI.bam") )
whitelist.bc.file <- c(paste0(extdata_path,"/example_TIP_sham_whitelist_barcodes.tsv"),
paste0(extdata_path,"/example_TIP_MI_whitelist_barcodes.tsv"))
peak.output.file <- c("Vignette_example_TIP_sham_peaks.txt",
"Vignette_example_TIP_MI_peaks.txt")
FindPeaks(output.file = peak.output.file[1], # output filename
gtf.file = reference.file, # gene model as a GTF file
bamfile = bamfile[1], # BAM alignment filename.
junctions.file = junctions.file, # BED filename of splice junctions exising in BAM file.
ncores = 1) # number of cores to use
however, I got the following error:
Error in validObject(.Object) :
invalid class "GFF2File" object: undefined class for slot "resource" ("characterORconnection")
Do you know how to cope with this issue?
Best
Hello Sierra team,
I was hoping to get some help regarding an issue (see below) that I'm having when running Sierra CountPeaks with a specific sample (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR13043563 and SRR13043565). The pipeline works fine with both the vignette dataset and with other samples from the same experiment/patient (ex. SRR13043564).
I've tried some of the common suggestions found in other issues (ex. checking my gtf annotations, barcode file, bam file index and etc...) but so far nothing worked.
I'm using cellranger's annotation as a reference (refdata-gex-GRCh38-2020-A) and mapped my samples with STAR aligner (since the original experiment is MARS-seq) and followed STARsolo guidelines to make it as close to cellranger as possible.
Thanks in advance for any help you can provide.
Best regards,
Felipe
Error in {: task 32 failed - "1 elements in value to replace 0 elements"
Traceback:
Hi @reprobate
I tried to use Sierra on single cell long read data. I know its not tested for the same. I came across the following error while running DUTest. Is this related to less amount of data?
[1] "5145 expressed peaks in feature types exon" [1] "1289 genes detected with multiple peak sites expressed" [1] "4910 individual peak sites to test" converting counts to integer mode [1] "Running DEXSeq test..." -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time. Error in countsThis[as.character(newMf[i, "exon"]), as.character(newMf[i, : subscript out of bounds
Hi Developers,
Thanks for this cool method.
I'm currently encountering an error when I run DUTest with a Seurat object. The error message is as follows.
[1] "Running DEXSeq test..."
Error in checkSlotAssignment(object, name, value) :
assignment of an object of class "DFrame" is not valid for slot 'elementMetadata' in an object of class "DEXSeqResults"; is(value, "DataTable_OR_NULL") is not TRUE
In addition: Warning message:
In DESeqDataSet(rse, design, ignoreRank = TRUE) :
some variables in design formula are characters, converting to factors
Can I get some advice to resolve it?
Thanks!
Best,
Soobeom
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS: /gpfs/share/apps/R/4.0.0/lib64/R/lib/libRblas.so
LAPACK: /gpfs/share/apps/R/4.0.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.4.0 Sierra_0.99.24 SeuratObject_4.0.0 Seurat_4.0.0
loaded via a namespace (and not attached):
[1] reticulate_1.18 tidyselect_1.1.0
[3] RSQLite_2.2.3 AnnotationDbi_1.52.0
[5] htmlwidgets_1.5.3 grid_4.0.0
[7] BiocParallel_1.24.1 Rtsne_0.15
[9] munsell_0.5.0 codetools_0.2-18
[11] ica_1.0-2 statmod_1.4.35
[13] future_1.21.0 miniUI_0.1.1.1
[15] colorspace_2.0-0 Biobase_2.50.0
[17] knitr_1.31 rstudioapi_0.13
[19] stats4_4.0.0 SingleCellExperiment_1.12.0
[21] ROCR_1.0-11 tensor_1.5
[23] listenv_0.8.0 MatrixGenerics_1.2.0
[25] GenomeInfoDbData_1.2.4 harmony_1.0
[27] hwriter_1.3.2 polyclip_1.10-0
[29] bit64_4.0.5 parallelly_1.23.0
[31] vctrs_0.3.6 generics_0.1.0
[33] xfun_0.20 biovizBase_1.38.0
[35] BiocFileCache_1.14.0 R6_2.5.0
[37] GenomeInfoDb_1.26.2 locfit_1.5-9.4
[39] AnnotationFilter_1.14.0 bitops_1.0-6
[41] spatstat.utils_2.0-0 cachem_1.0.1
[43] DelayedArray_0.16.1 assertthat_0.2.1
[45] promises_1.1.1 scales_1.1.1
[47] nnet_7.3-14 gtable_0.3.0
[49] globals_0.14.0 goftest_1.2-2
[51] ensembldb_2.14.0 rlang_0.4.10
[53] genefilter_1.72.1 splines_4.0.0
[55] rtracklayer_1.50.0 lazyeval_0.2.2
[57] dichromat_2.0-0 checkmate_2.0.0
[59] reshape2_1.4.4 abind_1.4-5
[61] GenomicFeatures_1.42.1 backports_1.2.1
[63] httpuv_1.5.5 Hmisc_4.4-2
[65] tools_4.0.0 ggplot2_3.3.3
[67] ellipsis_0.3.1 RColorBrewer_1.1-2
[69] BiocGenerics_0.36.0 ggridges_0.5.3
[71] Rcpp_1.0.6 plyr_1.8.6
[73] base64enc_0.1-3 progress_1.2.2
[75] zlibbioc_1.36.0 purrr_0.3.4
[77] RCurl_1.98-1.2 prettyunits_1.1.1
[79] rpart_4.1-15 openssl_1.4.3
[81] deldir_0.2-9 pbapply_1.4-3
[83] cowplot_1.1.1 S4Vectors_0.28.1
[85] zoo_1.8-8 SummarizedExperiment_1.20.0
[87] ggrepel_0.9.1 cluster_2.1.0
[89] magrittr_2.0.1 data.table_1.13.6
[91] scattermore_0.7 lmtest_0.9-38
[93] RANN_2.6.1 ProtGenerics_1.22.0
[95] fitdistrplus_1.1-3 matrixStats_0.57.0
[97] hms_1.0.0 patchwork_1.1.1
[99] mime_0.9 xtable_1.8-4
[101] XML_3.99-0.5 jpeg_0.1-8.1
[103] IRanges_2.24.1 gridExtra_2.3
[105] compiler_4.0.0 biomaRt_2.46.2
[107] tibble_3.0.6 KernSmooth_2.23-18
[109] crayon_1.3.4 htmltools_0.5.1.1
[111] mgcv_1.8-33 later_1.1.0.1
[113] Formula_1.2-4 geneplotter_1.68.0
[115] tidyr_1.1.2 DBI_1.1.1
[117] dbplyr_2.0.0 MASS_7.3-53
[119] rappdirs_0.3.2 Matrix_1.2-18
[121] parallel_4.0.0 Gviz_1.34.0
[123] igraph_1.2.6 GenomicRanges_1.42.0
[125] pkgconfig_2.0.3 GenomicAlignments_1.26.0
[127] foreign_0.8-80 plotly_4.9.3
[129] xml2_1.3.2 foreach_1.5.1
[131] annotate_1.68.0 XVector_0.30.0
[133] DEXSeq_1.36.0 VariantAnnotation_1.36.0
[135] digest_0.6.27 sctransform_0.3.2
[137] RcppAnnoy_0.0.18 spatstat.data_1.7-0
[139] Biostrings_2.58.0 leiden_0.3.6
[141] htmlTable_2.1.0 uwot_0.1.10
[143] curl_4.3 shiny_1.6.0
[145] Rsamtools_2.6.0 lifecycle_0.2.0
[147] nlme_3.1-151 jsonlite_1.7.2
[149] viridisLite_0.3.0 askpass_1.1
[151] BSgenome_1.58.0 pillar_1.4.7
[153] lattice_0.20-41 fastmap_1.1.0
[155] httr_1.4.2 survival_3.2-7
[157] glue_1.4.2 spatstat_1.64-1
[159] png_0.1-7 iterators_1.0.13
[161] bit_4.0.4 stringi_1.5.3
[163] blob_1.2.1 DESeq2_1.30.0
[165] latticeExtra_0.6-29 memoise_2.0.0
[167] dplyr_1.0.3 irlba_2.3.3
[169] future.apply_1.7.0
Hello,
I am trying to run FindPeaks on my data and am getting the following output. It appears to work fine but then after running for about 2 hours (using 32gb ram on 4 threads), I get an error that the line doesn't have 12 elements. The only potential problem I can think of is that the bam file has the tag for cell barcodes as BC:Z and UMI as U8:Z. Could that be the issue? The bam file was produced using long read single cell sequencing.
Hi,
I am using Sierra for 10x data. While executing the findpeaks, i get the following error:
Error in data.frame(EnsemblID = gtf_gr@elementMetadata@listData$gene_id, : arguments imply differing number of rows: 208940, 0
The GTF file i use here is not from ensembl. It is downloaded from NCBI.
Any suggestion is appreciated.
Thanks in advance!
Hi, I'm new to working with Seurat objects as I'm used to various scRNA-seq Python tools. I'm having trouble with both PeakSeuratFromTransfer and NewPeakSeurat if you could help me out. I think Sierra is a really cool package and I'm excited to use it :)
Using an existing Seurat object and running PeakSeuratFromTransfer, I get the following:
peaks_seurat <- PeakSeuratFromTransfer(peak.data = peak_counts,
genes.seurat = sdata,
annot.info = peak_annotations,
min.cells = 0, min.peaks = 0,
)
Error in if (gene.cov != "") { : missing value where TRUE/FALSE needed
Trying to make a new Seurat object from peak_counts and peak_annotations I get:
peaks.seurat <- NewPeakSeurat(peak.data = peak_counts,
annot.info = peak_annotations,
project.name = "cr_st5",
filter.gene.mismatch = FALSE,
verbose = TRUE)
[1] "Creating Seurat object with 0 peaks and 2921 cells"
Error in annot.info[peaks.use, feature.names] :
incorrect number of dimensions
If I set filter.gene.mismatch = TRUE I get a different error: Error in if (gene.cov != "") { : missing value where TRUE/FALSE needed, which is the same as the PeakSeuratFromTransfer error.
I've tried digging into the source code to see what might be wrong, but I'm not super familiar with R so I'm having a bit of trouble.
Thanks!
Ashley
I'm attempting to figure out the source of the following error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'writeMM' for signature '"NULL"'
I just re-installed Sierra after finding the same error described here: #35
I have previously ran Sierra successfully on 10x data from the same pipeline (STARsolo). The current dataset is substantially larger, about 1.5 billion reads per sample. Running via slurm with 128GB RAM provided.
Edit: subsetting the BAM to chr19 only did not resolve the error so I don't think this is a memory issue.
Command:
CountPeaks(peak.sites.file = peak.output.file, # FindPeaks runs fine
gtf.file = gtf, # ensembl build 99 gtf for mouse, filterd per CellRanger spec
bamfile = bam, # indexed bam file from STARsolo
whitelist.file = bcs, # text file containing one column with barcode sequences
output.dir = count.dir,
countUMI = TRUE,
chr.names = c("1","2","3"), # subset to three chromosomes to expedite testing
filter.chr=TRUE, # (your documentation describes this option incorrectly)
ncores = 8)
Full output below:
Warning messages:
1: replacing previous import 'GenomicRanges::union' by 'dplyr::union' when loading 'Sierra'
2: replacing previous import 'GenomicRanges::intersect' by 'dplyr::intersect' when loading 'Sierra'
3: replacing previous import 'GenomicRanges::setdiff' by 'dplyr::setdiff' when loading 'Sierra'
4: replacing previous import 'Gviz::tail' by 'utils::tail' when loading 'Sierra'
5: replacing previous import 'Gviz::head' by 'utils::head' when loading 'Sierra'
There are 8266 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 168884 sites
Doing counting for each site...
Processing chr: 3
and strand 1
Processing chr: 2
and strand 1
Processing chr: 1
and strand 1
and strand -1
and strand -1
and strand -1
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'writeMM' for signature '"NULL"'
Calls: CountPeaks -> ->
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
Execution halted
thank you,
Hi,
While trying to install the package i get the following error:
Error: package or namespace load failed for ‘Sierra’:
.onLoad failed in loadNamespace() for 'checkmate', details:
call: get0(oNam, envir = ns)
error: lazy-load database '/Users/samhas/Library/R/3.6/library/backports/R/backports.rdb' is corrupt
In addition: Warning messages:
1: In get0(oNam, envir = ns) : restarting interrupted promise evaluation
2: In get0(oNam, envir = ns) : internal error -3 in R_decompress1
Thanks in advance!
Hi!
Love the idea of Sierra, and am looking forward to applying the pipeline to our data.
I've ran the Sierra Vignette example using the pre-formatted inputs. However, prior to beginning to work with our own larger data sets, I wanted to test generating the requisite Splice junctions file .bed file using the example data provided in the vignette.
I do want to preface that I'm unsure if the following is a Regtools error, or an issue with the Sierra data.
I pulled the current regtools docker image from their repo, and after downloading a fresh copy of the Vignette_example_TIP_sham.bam file from here, I ran the following:
PS C:> docker run griffithlab/regtools regtools junctions extract -s 1 C:\Vignette_example_TIP_sham.bam -o C:\testoutput.bed
Program: regtools
Version: 0.5.2
Minimum junction anchor length: 8
Minimum intron length: 70
Maximum intron length: 500000
Alignment: C:\Vignette_example_TIP_sham.bam
Output file: C:\testoutput.bed
[E::hts_open_format] fail to open file 'C:\Vignette_example_TIP_sham.bam'
Unable to open BAM/SAM file.
Is this a docker / regtools issue (eg, not Sierra)? Or is this a Sierra data issue? Any help or pointers on how to resolve this issue would be great! If I can reproducibly run your Vignette, then I am certain I can get it working with our data.
Andrew
E creating vignettes (36.4s)
--- re-building ‘Sierra_vignette.rmd’ using rmarkdown
Quitting from lines 297-298 (Sierra_vignette.rmd)
Error: processing vignette 'Sierra_vignette.rmd' failed with diagnostics:
argument is of length zero
--- failed re-building ‘Sierra_vignette.rmd’
Hi,
I am running Sierra CountPeaks() using ~80000 cells, across 6 10x lanes with a cell barcode structure like this "R1-L1_NNNNNNNNNNNNNNNNNN" (R indicates replicate, L indicates lane).The parsed BAM is about 600GB. and number of peaks is ~180000.
Running
CountPeaks(peak.sites.file = peak.output.file, gtf.file = reference.file, bamfile = bamfile, whitelist.file = whitelist.bc.file, output.dir = count.dirs, countUMI = TRUE, ncores = 6)
leading to the following prompt and error:
There are 77835 whitelist barcodes. Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK There are 184034 sites Doing counting for each site... Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'writeMM' for signature '"NULL"' In addition: Warning messages: 1: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon, exon. This information was ignored. 2: In mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed, : scheduled cores 1, 2, 3, 4, 5, 6 did not deliver results, all values of the jobs will be affected Registered S3 method overwritten by 'spatstat': method from print.boxx cli
I get an error when using the CountPeaks function for one of my samples:
There are 14963 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 7 sites
Doing counting for each site...
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'writeMM' for signature '"NULL"'
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information
was ignored.
Do you know what can be the problem ? (i saw a similar issue posted by another user, but did not find the solution)
Thanks !
Hello, I got the following error after the installation.
Thank you for the help in advance.
devtools::install_github("VCCRI/Sierra", build = TRUE, build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))
Downloading GitHub repo VCCRI/Sierra@HEAD
✔ checking for file ‘/private/var/folders/28/jd9d90s50mdctp6_0jk1wthw0000gq/T/RtmpRyVoSj/remotesa75d3f6bab38/VCCRI-Sierra-ef71a45/DESCRIPTION’ ...
─ preparing ‘Sierra’:
✔ checking DESCRIPTION meta-information ...
─ installing the package to build vignettes
✔ creating vignettes (34s)
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘Sierra_0.99.22.tar.gz’
library(Sierra)
Error: package or namespace load failed for ‘Sierra’ in namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/magrittr/R/magrittr.rdb' is corrupt
In addition: Warning messages:
1: In namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
restarting interrupted promise evaluation
2: In namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, :
internal error -3 in R_decompress1
I'm having issues trying to perform peak calling.
Here are the error messages I got:
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
1 gene entries to process
There are 0 unfiltered sites and 0 filtered sites
Error in `$<-.data.frame`(`*tmp*`, "polyA_ID", value = "::-:") :
replacement has 1 row, data has 0
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information
was ignored.
Do you know what can be the problem ?
Many thanks for your feedback
Best, Isa
Hello,
Can I use a different version of GTF when I run Sierra compared to the sequence?
For example, use the GTF file of Ensembl GRCm38 release98 during sequence alignment. Use the GTF file in Ensembl GRCm38 release102 when running Sierra.
Hello,
I have a custom dataset that has multiple read pairs that share the same UMI, would the counting algorithm take into account those multiple peaks as 1 count for each of those exons/introns?
Best,
Chang
Thank you for your great work and Sierra. I have a question. For multiple data sets with batch effect (for example two 10X data sets from different experiments or one 10X and one Smart-Seq2 data), is it possible to use Sierra? Do you think this kind of analysis creates a bias in results?
Thank you in advance.
Dear developing team,
When I run the mentioned function I get the following error.
Error in data.frame(PeakID = peaks.expressed, row.names = peaks.expressed.granges, : duplicate row.names: chr16:92085167-92085395:+
Btw im using mm10, latest ensembl (100) annotation, so there you have three genes:
"Mrps6:chr16:92085167-92085395:1" "Slc5a3:chr16:92085167-92085395:1" "Gm49711:chr16:92085167-92085395:1"
In fact I get multiple such errrors. These genomic regions correspond to areas that more than one gene (in the annotation) share. One can be a pseudogene, an intronic region or whatever. These errors come from peak counts tables that have the same peak annotated more than once (for a different gene each time). For example two peaks:
"Lypla1:chr1:4845940-4847188:1" "Gm37988:chr1:4845940-4847188:1"
This probably classifies as a bug, since this is a possibility in annotations, yet you do not account for this.
Im not sure if there is a straightforward deterministic way to solve this. It would make sense for the algorithm to pick at least an exonic peak in this case. Even if it picks a random exonic annotation.
best,
amisios
Runnnig DUTest()
on a dataset with gene names containing colons (:) or spaces results in:
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names':
Because DEXSeq
removes spaces and columns in the gene names:
Warning messages:
1: In DEXSeq::DEXSeqDataSet(peak.matrix, sampleData = sampleTable, :
empty spaces or ':' characters were found either in your groupIDs or in your featureIDs, these will be removed from the identifiers
Causing mismatches between gene names and DEXSeq output.
This can be solved with changing differential_usage.R
line 671 to:
# removing colons and spaces to match output of DEXSeq
pid_gene_names <- gsub('[: ]', '', dexseq.feature.table$Gene_name)
rownames(dexseq.feature.table) <- paste0(pid_gene_names, ":", dexseq.feature.table$Peak_number)
Hi Ralph,
Hope everything is well,
It seems there multithreading is not working for functions that use the apply_DEXSeq_test_seurat. (only happens when trying multithreading)
here is an example of the error I'm getting. Would appreciate it if you have any comments as I need to speed up the process as I'm running hundreds of this test. Thanks.
'''
> apa.res <- DetectAEU(peaks.object = peaks_so,
+ gtf_gr = gtf_gr,
+ gtf_TxDb = gtf_TxDb,
+ do.MAPlot = T,
+ population.1 = cell1,
+ population.2 = cell2,
ncores = 20)
[1] "10082 expressed peaks in feature types UTR3"
[1] "2232 genes detected with multiple peak sites expressed"
[1] "6803 individual peak sites to test"
converting counts to integer mode
[1] "Running DEXSeq test..."
Error: $ operator is invalid for atomic vectors
In addition: Warning message:
In DESeqDataSet(rse, design, ignoreRank = TRUE) :
Error: $ operator is invalid for atomic vectors
'''
Hi there, thank you for creating this interesting package!
I am attempting to create my peak level Seurat object. I initially received an error when using PeakSeuratFromTransfer that there were no matching barcodes despite arranging them in the same order when aggregating peaks (the combined prepared Seurat object is has barcode suffixes _1-1, _2-1 etc, which was how I understood the Sierra barcode appending worked as well, but maybe I am incorrect?)
When I tried NewPeakSeurat I encountered another error. It seems to want tsne coordinates even though I am supplying umap coordinates in the function.
Here is what I did along with the error
umap.coords <- query[["proj.umap"]]@cell.embeddings
peaks.seurat <- NewPeakSeurat(peak.data = peak.counts,
annot.info = peak.annotations,
cell.idents = populations,
umap.coords = umap.coords,
min.cells = 0, min.peaks = 0)
[1] "Creating Seurat object with 35146 peaks and 60227 cells"
Warning: The following arguments are not used: row.names
[1] "Preparing feature table for DEXSeq"
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "No t-SNE coodinates included"
Error in umap.coords[colnames(peaks.seurat), ] : subscript out of bounds
Any help would be appreciated!
Thanks
Hi guys,
Thanks for being so responsive and fixing all my issues. Unfortunately I have now found a new issue in the DUTest function when I run it using the vignette code for the SingleCellExperiment way. I get the following error message:
res.table = DUTest(peaks.sce, population.1 = "BalloonCells", population.2 = "Astrocytes",
+ exp.thresh = 0.1, feature.type = c("exon"))
Error: $ operator not defined for this S4 class
This seems to be caused by line 1028 in differential_usage.R (get_expressed_peaks_sce function). I am pretty sure the line should be this.data <- peaks.sce.object@assays@data$counts
.
Cheers,
Saskia
Hi,
Thanks for creating this great tool! I'm running into an error when using the zoom3UTR=TRUE argument in the CoveragePlot function, e.g:
PlotCoverage(genome_gr = gtf_gr,
geneSymbol = "CXCL8",
genome = "hg38",
pdf_output = FALSE,
bamfiles = c("Disease.CXCL8.bam", "Control.CXCL8.bam"),
zoom_3UTR=TRUE,
bamfile.tracknames=c("Disease", "Control"))
}
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'to': error in evaluating the argument 'query' in selecting a method for function 'findOver
laps': In range 1: at least two out of 'start', 'end', and 'width', must
be supplied.
Do you know what could be causing this error? I am using R version 4.0.5 (2021-03-31), and Sierra_0.99.24.
Thanks
Hi, your program is very useful and a great tool. I was wondering if there's a way to provide a 3'end database from the beginning instead of calling peaks, just count?
thanks
The function PlotRelativeExpressionBox
looks for tSNE cell embeddings, and throws an error if there is no tSNE run (I've only run a UMAP):
Error in PlotRelativeExpressionBox(peaks.seurat, peaks.to.plot = peaks.to.plot) :
trying to get slot "cell.embeddings" from an object of a basic class ("NULL") with no slots
As far as I understand, a boxplot shouldn't require tSNE coordinates.
Hi Ralph,
Thanks for this awesome tool.
Can you please help me with this Error that happens in CountPeaks? >> "Error in x$.self$finalize() : attempt to apply non-function "
``
There are 3028 whitelist barcodes.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
There are 118455 sites
Doing counting for each site...
Processing chr: chrX
and strand 1
Processing chr: chr20
and strand 1
Processing chr: chr1
and strand 1
Processing chr: chr6
and strand 1
Processing chr: chr3
and strand 1
Processing chr: chr7
and strand 1
Processing chr: chr12
and strand 1
Processing chr: chr11
and strand 1
Processing chr: chr4
and strand 1
Processing chr: chr17
and strand 1
Processing chr: chr2
and strand 1
Processing chr: chr16
and strand 1
Processing chr: chr8
and strand 1
Processing chr: chr19
and strand 1
Processing chr: chr9
and strand 1
Processing chr: chr13
and strand 1
Processing chr: chr14
and strand 1
Processing chr: chr5
and strand 1
Processing chr: chr22
and strand 1
Processing chr: chr10
and strand 1
Processing chr: chrY
and strand 1
Processing chr: chr18
and strand 1
Processing chr: chr15
and strand 1
Processing chr: chr21
and strand 1
Processing chr: chrM
and strand 1
Processing chr: KI270713.1
and strand 1
and strand -1
Processing chr: KI270711.1
and strand 1
and strand -1
Processing chr: GL000205.2
and strand 1
and strand -1
Processing chr: KI270728.1
and strand 1
and strand -1
Processing chr: GL000219.1
and strand 1
and strand -1
Processing chr: KI270727.1
and strand 1
and strand -1
Processing chr: GL000194.1
and strand 1
and strand -1
**Error in x$.self$finalize() : attempt to apply non-function**
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
``
Cheers,
Hi,
I am trying out the package Sierra and I'd like to thank you for the really useful functions it has.
Nonetheless, I think I found some a problem that I would like to bring to your attention.
Error in if (gene.cov != "") { : missing value where TRUE/FALSE needed
peaks_seurat <- PeakSeuratFromTransfer(peak.data = peak.counts, genes.seurat = seurat.object, annot.info = peak.annotations, min.cells = 1, min.peaks = 1, filter.gene.mismatch = T)
peak_annotations <- read.table("data/merged_annotated_peaks.txt", header = T, sep = "\t", row.names = 1, stringsAsFactors = FALSE) %>% filter(!is.na(gene_id))
Hope this is helpful, if this was just a mistake on my side please close and delete the issue.
Best,
Nicola
Tried the vignette and it worked perfectly and so I'm excited to try Sierra on other datasets. I was trying the 10X E18 v3 NextGem sample (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_neuron_v3_nextgem) and got to the CountPeaks step, which stopped with the following error:
Processing chr: 1
and strand 1
[W::hts_idx_load2] The index file is older than the data file: 5k_neuron_v3_nextgem_possorted_genome_bam.bam.bai
Processing chr: 12
and strand 1
[W::hts_idx_load2] The index file is older than the data file: 5k_neuron_v3_nextgem_possorted_genome_bam.bam.bai
Processing chr: 19
and strand 1
[W::hts_idx_load2] The index file is older than the data file: 5k_neuron_v3_nextgem_possorted_genome_bam.bam.bai
Processing chr: MT
and strand 1
[W::hts_idx_load2] The index file is older than the data file: 5k_neuron_v3_nextgem_possorted_genome_bam.bam.bai
Error in { : task 1 failed - "1 elements in value to replace 0 elements"
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This
information was ignored.
The previous step had seemingly completed normally:
Gene Chr Strand MaxPosition Fit.max.pos Fit.start Fit.end mu sigma k exon.intron
1 Gnai3 3 -1 108107503 108107434 108107280 108107683 231.8151 83.64921 4430.0914 non-juncs
2 Gnai3 3 -1 108112568 108112569 108111881 108118443 301.0178 101.11622 2918.5156 non-juncs
3 Gnai3 3 -1 108107807 108107849 108107726 108107972 342.2765 41.31000 570.5898 non-juncs
4 Gnai3 3 -1 108144083 108144086 108143879 108144293 303.7170 69.29031 451.9037 non-juncs
5 Gnai3 3 -1 108126299 108126308 108126083 108126533 309.5252 75.23088 376.4667 non-juncs
6 Gnai3 3 -1 108127845 108127856 108127652 108128060 311.5164 68.22383 326.3526 non-juncs
exon.pos
1
2 (108111881,108112087)(108112473,108112601)(108115763,108115890)(108118301,108118443)
3
4
5
6
There are 65746 unfiltered sites and 64181 filtered sites
There are 64132 sites following duplicate removal
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This
information was ignored.
Any suggestions on what is happening?
Hi everyone,
I'm trying to install Sierra and getting this Error.
"Error: Failed to install 'Sierra' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> Quitting from lines 27-41 (Sierra_vignette.rmd)
E> Error: processing vignette 'Sierra_vignette.rmd' failed with diagnostics:
E> argument is of length zero
E> --- failed re-building ‘Sierra_vignette.rmd’
E>
E> SUMMARY: processing the following file failed:
E> ‘Sierra_vignette.rmd’
E>
E> Error: Vignette re-building failed.
E> Execution halted"
Seems you need to update the vignette file or do you guys have any idea?
Cheers
Aiden
Hi Sierra team,
I am trying to calculate the peak matrix from scRNA-seq data from human tumor. The vignettes goes well. But I get some problems when annotating peak. Reference genome and gtf file was downloaded from 10x Genomics. Junction file was calculated follow the vignettes, except "-s was setted to 0", since it is a requirement. Enclosed error information and codes.
Looking forward to your suggestion.
Regards,
Nelson
regtools junctions extract possorted_genome_bam.bam -o L28_junction.bed -s 0
> AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
+ gtf.file = reference.file,
+ output.file = "TIP_merged_peak_annotations.txt",
+ genome = genome)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
[1] "Annotating 65378 peak coordinates."
Annotating 3' UTRs
Annotating 5' UTRs
Annotating introns
Annotating exons
Annotating CDSError in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'getSeq' for signature '"character"'
In addition: Warning messages:
1: In for (i in seq_len(n)) { :
closing unused connection 4 (/home/ot/R/x86_64-pc-linux-gnu-library/3.6/Sierra/extdatagenes.gtf)
2: In for (i in seq_len(n)) { :
closing unused connection 3 (/home/ot/R/x86_64-pc-linux-gnu-library/3.6/Sierra/extdatagenes.gtf)
3: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
4: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
5: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
6: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
7: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
8: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
9: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
10: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
> library(Sierra)
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_HK.UTF-8 LC_NUMERIC=C LC_TIME=en_HK.UTF-8
[4] LC_COLLATE=en_HK.UTF-8 LC_MONETARY=en_HK.UTF-8 LC_MESSAGES=en_HK.UTF-8
[7] LC_PAPER=en_HK.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.1 S4Vectors_0.24.1
[5] BiocGenerics_0.32.0 Sierra_0.1.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.18.0 bitops_1.0-6 matrixStats_0.55.0
[4] flock_0.7 bit64_0.9-7 doParallel_1.0.15
[7] RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.1
[10] tools_3.6.1 backports_1.1.5 R6_2.4.1
[13] rpart_4.1-15 Hmisc_4.3-0 DBI_1.0.0
[16] lazyeval_0.2.2 Gviz_1.30.0 colorspace_1.4-1
[19] nnet_7.3-12 tidyselect_0.2.5 gridExtra_2.3
[22] prettyunits_1.0.2 bit_1.1-14 curl_4.3
[25] compiler_3.6.1 Biobase_2.46.0 htmlTable_1.13.2
[28] DelayedArray_0.12.0 rtracklayer_1.46.0 scales_1.1.0
[31] checkmate_1.9.4 askpass_1.1 rappdirs_0.3.1
[34] stringr_1.4.0 digest_0.6.23 Rsamtools_2.2.1
[37] foreign_0.8-72 XVector_0.26.0 base64enc_0.1-3
[40] dichromat_2.0-0 pkgconfig_2.0.3 htmltools_0.4.0
[43] ensembldb_2.10.2 BSgenome_1.54.0 dbplyr_1.4.2
[46] htmlwidgets_1.5.1 rlang_0.4.2 rstudioapi_0.10
[49] RSQLite_2.1.3 BiocParallel_1.20.0 acepack_1.4.1
[52] dplyr_0.8.3 VariantAnnotation_1.32.0 RCurl_1.95-4.12
[55] magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3
[58] Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0
[61] lifecycle_0.1.0 stringi_1.4.3 SummarizedExperiment_1.16.0
[64] zlibbioc_1.32.0 plyr_1.8.4 BiocFileCache_1.10.2
[67] grid_3.6.1 blob_1.2.0 crayon_1.3.4
[70] lattice_0.20-38 Biostrings_2.54.0 splines_3.6.1
[73] GenomicFeatures_1.38.0 hms_0.5.2 zeallot_0.1.0
[76] knitr_1.26 pillar_1.4.2 reshape2_1.4.3
[79] codetools_0.2-16 biomaRt_2.42.0 XML_3.98-1.20
[82] glue_1.3.1 biovizBase_1.34.0 latticeExtra_0.6-28
[85] BiocManager_1.30.10 data.table_1.12.6 foreach_1.4.7
[88] vctrs_0.2.0 gtable_0.3.0 openssl_1.4.1
[91] purrr_0.3.3 assertthat_0.2.1 ggplot2_3.2.1
[94] xfun_0.11 AnnotationFilter_1.10.0 survival_2.44-1.1
[97] SingleCellExperiment_1.8.0 tibble_2.1.3 iterators_1.0.12
[100] GenomicAlignments_1.22.1 AnnotationDbi_1.48.0 memoise_1.1.0
[103] cluster_2.1.0
library(Sierra)
peak.output.file <- c("Vignette_example_TIP_sham_peaks.txt",
"Vignette_example_TIP_MI_peaks.txt")
FindPeaks(output.file = peak.output.file[2], # output filename
gtf.file = "genes.gtf", # gene model as a GTF file
bamfile = "./L07/possorted_genome_bam.bam", # BAM alignment filename.
junctions.file = "./L07/L07_junction.bed", # BED filename of splice junctions exising in BAM file.
ncores = 4) # number of cores to use
FindPeaks(output.file = peak.output.file[1], # output filename
gtf.file = "genes.gtf", # gene model as a GTF file
bamfile = "./L28/possorted_genome_bam.bam", # BAM alignment filename.
junctions.file = "./L28/L28_junction.bed", # BED filename of splice junctions exising in BAM file.
ncores = 4)
### Merge data
### Read in the tables, extract the peak names and run merging ###
peak.dataset.table = data.frame(Peak_file = peak.output.file,
Identifier = c("TIP-example-Sham", "TIP-example-MI"),
stringsAsFactors = FALSE)
peak.merge.output.file = "TIP_merged_peaks.txt"
MergePeakCoordinates(peak.dataset.table, output.file = peak.merge.output.file, ncores = 1)
### Count Peak
count.dirs <- c("example_TIP_sham_counts", "example_TIP_MI_counts")
#sham data set
CountPeaks(peak.sites.file = peak.merge.output.file,
gtf.file = "genes.gtf",
bamfile = "./L28/possorted_genome_bam.bam",
whitelist.file = "./L28/barcodes.tsv.gz",
output.dir = count.dirs[1],
countUMI = TRUE,
ncores = 4)
# MI data set
CountPeaks(peak.sites.file = peak.merge.output.file,
gtf.file = "genes.gtf",
bamfile = "./L07/possorted_genome_bam.bam",
whitelist.file = "./L07/barcodes.tsv.gz",
output.dir = count.dirs[2],
countUMI = TRUE,
ncores = 4)
### Integration
peak.merge.output.file <- "TIP_merged_peaks.txt"
count.dirs <- c("example_TIP_sham_counts", "example_TIP_MI_counts")
# New definition
out.dir <- "example_TIP_aggregate"
# Now aggregate the counts for both sham and MI treatments
AggregatePeakCounts(peak.sites.file = peak.merge.output.file,
count.dirs = count.dirs,
exp.labels = c("Sham", "MI"),
output.dir = out.dir)
### Annotation peak
# As previously defined
peak.merge.output.file <- "TIP_merged_peaks.txt"
reference.file <- "genes.gtf"
# New definitions
genome <- "/media/ot/Data/Nelson/refdata-cellranger-GRCh38-3.0.0/fasta"
AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
gtf.file = reference.file,
output.file = "TIP_merged_peak_annotations.txt",
genome = genome)
In trying to fix my other open issue, I wanted to reinstall Sierra. However on 4 different machines (3 windows and 1 ubuntu) I'm getting the same error (R version 3.6.1 and 3.6.2). On two of them, I have wiped my library directory clean, done a fresh install of R, and retried with the same error. Any suggestions?
Thanks again!
devtools::install_github("VCCRI/Sierra", build = TRUE, build_vignettes = TRUE, build_opts = c("--no-resave-data", "--no-manual"))
Downloading GitHub repo VCCRI/Sierra@master
✓ checking for file ‘/tmp/RtmprLptpC/remotes218c1f0f2761/VCCRI-Sierra-adca482/DESCRIPTION’ ...
─ preparing ‘Sierra’:
✓ checking DESCRIPTION meta-information ...
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:25: unknown macro '\item'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:27: unknown macro '\item'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:29: unknown macro '\item'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:31: unknown macro '\item'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:33: unknown macro '\item'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:35: unexpected section header '\value'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:38: unexpected section header '\description'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:41: unexpected section header '\examples'
Warning: /tmp/Rtmper8hhG/Rbuild23fb72ed048e/Sierra/man/FindPeaks.Rd:45: unexpected END_OF_INPUT '
'
─ installing the package to build vignettes
✓ creating vignettes (30s)
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘Sierra_0.2.3.tar.gz’
Installing package into ‘/home/[username]/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
Hello,
I was wondering if it was possible to accept splice junctions generated from STAR/STARSolo?
I noticed it was slightly different from regtools in terms of junctions called. The data format is somewhat similar as well.
The format is outlined in the STAR manual.
Best,
Chang
Hi,
I am having problems annotating the peaks. I get the following error message:
AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
+ gtf.file = reference.file,
+ output.file = "data/peaks/merged_peak_annotations.txt",
+ genome = genome)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
[1] "Annotating 264857 peak coordinates."
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).No samples matchedError: $ operator is invalid for atomic vectors
Here is my sessionInfo:
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /stornext/System/data/apps/R/R-3.6.1/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-3.6.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0 Sierra_0.2.3
loaded via a namespace (and not attached):
[1] ProtGenerics_1.16.0 bitops_1.0-6 matrixStats_0.55.0 bit64_0.9-7
[5] RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.1 rstan_2.19.2
[9] backports_1.1.5 tools_3.6.1 R6_2.4.1 rpart_4.1-15
[13] Hmisc_4.3-0 DBI_1.1.0 lazyeval_0.2.2 Gviz_1.28.3
[17] colorspace_1.4-1 nnet_7.3-12 tidyselect_1.0.0 gridExtra_2.3
[21] prettyunits_1.1.0 processx_3.4.1 curl_4.3 bit_1.1-15.2
[25] compiler_3.6.1 cli_2.0.1 Biobase_2.44.0 htmlTable_1.13.3
[29] DelayedArray_0.12.2 rtracklayer_1.46.0 checkmate_1.9.4 scales_1.1.0
[33] callr_3.4.0 stringr_1.4.0 digest_0.6.23 Rsamtools_2.2.2
[37] StanHeaders_2.21.0-1 foreign_0.8-75 XVector_0.26.0 dichromat_2.0-0
[41] htmltools_0.4.0 base64enc_0.1-3 jpeg_0.1-8.1 pkgconfig_2.0.3
[45] ensembldb_2.8.1 BSgenome_1.52.0 htmlwidgets_1.5.1 rlang_0.4.4
[49] rstudioapi_0.10 RSQLite_2.2.0 BiocParallel_1.20.1 acepack_1.4.1
[53] dplyr_0.8.3 VariantAnnotation_1.30.1 inline_0.3.15 RCurl_1.98-1.1
[57] magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3 loo_2.2.0
[61] Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0 fansi_0.4.1
[65] lifecycle_0.1.0 stringi_1.4.5 SummarizedExperiment_1.16.1 zlibbioc_1.30.0
[69] plyr_1.8.5 pkgbuild_1.0.6 grid_3.6.1 blob_1.2.1
[73] crayon_1.3.4 lattice_0.20-38 Biostrings_2.54.0 splines_3.6.1
[77] GenomicFeatures_1.36.4 hms_0.5.3 BSgenome.Hsapiens.UCSC.hg38_1.4.1 knitr_1.24
[81] ps_1.3.0 pillar_1.4.3 codetools_0.2-16 biomaRt_2.40.5
[85] XML_3.99-0.3 glue_1.3.1 biovizBase_1.32.0 latticeExtra_0.6-29
[89] data.table_1.12.8 BiocManager_1.30.10 foreach_1.4.7 png_0.1-7
[93] vctrs_0.2.2 gtable_0.3.0 purrr_0.3.3 assertthat_0.2.1
[97] ggplot2_3.2.1 xfun_0.12 AnnotationFilter_1.8.0 survival_3.1-8
[101] SingleCellExperiment_1.8.0 tibble_2.1.3 iterators_1.0.12 GenomicAlignments_1.22.1
[105] AnnotationDbi_1.48.0 memoise_1.1.0 cluster_2.1.0
Hi,
I'm following the vignette in the wiki and running the code in chunks. The following chunk raised an error:
outdir = "bam_subsets/"
dir.create(outdir)
SplitBam(bamfile[1], cells.df, outdir)
and the error reads:
Error in Rsamtools::ScanBamParam(tag = bamTags, what = what, tag = bamTags) :
formal argument "tag" matched by multiple actual arguments
Looking at the code in split_bams.R, all the function calls to Rsamtools::ScanBamParam contain the tag argument twice. Shouldn't the argument be passed just once?
Best,
Daniel
Hello,
it's really an impressive tool to broaden the single cell analysis. And I want to use this tool in my study recently, but when I run the
peak.merge.output.file <- "AML_ALL_merged_peaks.txt"
out.dir <- "/mnt/data/user_data/xiangyu/workshop/scRNA/scAPA/sierra/AML_all_merge/AML_all_merge"
AggregatePeakCounts(peak.sites.file = peak.merge.output.file,
count.dirs = count.dirs,
exp.labels = c("WBM","HSPC","T0","T1_L","T1_R","T2_2R","T2_N","Tend_1L","Tend_2L"),
output.dir = out.dir)
there is an error :
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) :
invalid character indexing
I have checked the source code
function (peak.sites.file, count.dirs, output.dir, exp.labels = NULL)
{
if (!is.null(exp.labels)) {
.......
peak.table <- read.table(peak.sites.file, sep = "\t", header = TRUE,
stringsAsFactors = FALSE)
all.peaks <- peak.table$polyA_ID
aggregate.counts <- c()
for (i in 1:length(count.dirs)) {
this.dir <- count.dirs[i]
this.data <- ReadPeakCounts(this.dir)
cell.names = colnames(this.data)
barcodes = sub("(.*)-\\d", "\\1", cell.names)
if (is.null(exp.labels)) {
cell.names.update = paste0(barcodes, "-", i)
}
else {
cell.names.update = paste0(barcodes, "-", exp.labels[i])
}
colnames(this.data) = cell.names.update
this.data <- this.data[all.peaks, ]
aggregate.counts <- cbind(aggregate.counts, this.data)
}
if (!dir.exists(output.dir)) {
...........
and I found that the rownames of this.data
didn't be included in all.peaks
files.
I got the all.peaks by your pipeline
peak.dataset.table = data.frame(Peak_file = peak.output.file,
Identifier = c("WBM","HSPC","T0","T1_L","T1_R","T2_2R","T2_N","Tend_1L","Tend_2L"),
stringsAsFactors = FALSE)
peak.merge.output.file = "AML_ALL_merged_peaks.txt"
MergePeakCoordinates(peak.dataset.table, output.file = peak.merge.output.file, ncores = 30)
Could you help me work it out?
And I also tried another way as following codes:
names <- c("Too_BAOHONG","Too_HSPC","","T1_L","T1_R","T2_2R","T2_N","Tend_1L","Tend_2L")
new_obj <- list()
for (i in 1:length(count.dirs)) {
this.dir <- count.dirs[i]
this.data <- ReadPeakCounts(this.dir)
cell.names = colnames(this.data)
barcodes = sub("(.*)-\\d", "\\1", cell.names)
cell.names.update = paste0(names[i],"_",barcodes)
colnames(this.data) = cell.names.update
new_obj[[i]] <- CreateSeuratObject(counts = this.data)
message(names[i]," is done")
}
all_data <- merge(x=new_obj[[1]],y=new_obj[2:length(names)])
aggregate.counts <- GetAssayData(all_data,slot="counts")
peak.sites.file = peak.merge.output.file
peak.table <- read.table(peak.sites.file, sep = "\t", header = TRUE,stringsAsFactors = FALSE)
all.peaks <- peak.table$polyA_ID
both_id <- intersect(all.peaks,rownames(aggregate.counts))
output.dir = out.dir
aggregate.counts <- aggregate.counts[both_id,]
Matrix::writeMM(aggregate.counts, file = paste0(output.dir,"/matrix.mtx"))
writeLines(colnames(aggregate.counts), paste0(output.dir,"/barcodes.tsv"))
writeLines(rownames(aggregate.counts), paste0(output.dir,"/sitenames.tsv"))
I just used the Seurat formula merge()
to merge all smples' peak files, and I reminded the peaks positions both in MergePeakCoordinates()
result and in Seurat::merge
results.
And I continued the next steps, did I use right way ?
Thanks!
Hi! Very excited about this package!
Just have a question about whether the peak counts that Sierra gets from the PeakSeuratFromTransfer function is normalised by gene-level counts or not?
If not, is there any reason why the absolute peak counts are used instead of peak counts relative to each gene? (basically junction PSI value)
Additionally, if the peak counts are not already normalised by gene expression, how would I go about doing the normalisation myseslf? I'm not very familiar with manipulating S4 objects.
Thank you! :)
Angel
Hi,
I'm using Sierra on single cell data of Drosophila, and the gtf contains gene symbols with a quote in the name (e.g. beta'COP).
This results into issues with the function FindPeaks
. As a last step, the output table is read back in for filtering, but that results in warnings:
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
But more importantly, a truncated file; only the first ~4k lines of ~50k lines are read in.
I don't think you can expect quoted values in the input data table, so I think you can safely change (line 724; count_polyA.R
)
peak.sites <- read.table(peak.sites.file, header = T, sep = "\t",
stringsAsFactors = FALSE)
Into:
peak.sites <- read.table(peak.sites.file, header = T, sep = "\t", quote = '',
stringsAsFactors = FALSE)
I've sent a pull request
hi,
I have successfully run DTUtest() on my dataset and now I am trying to figure out what a called DTU peak means in terms of differential junction usage.
The results table (return.dexseq.res = FALSE) successfully gives the gene names and genomic intervals like "RPL12:9:127449689-127451405:-1".
However, I found that this does not seem to refer to any single splice junctions which were in the BAM file. Are these genomic intervals supposed to describe the regions of the genome where local splicing variations are taking place?
I am more interested in finding the splice junctions contributing to each DTU peak detected. However, the intermediate output files from Sierra don't seem to have the information that allows me to at least trace back each differential peak to the junctions. Is there any other way I can achieve this? Or is my interpretation of the output wrong...?
Many Thanks
Dear developer team,
I have two issues when I run When I run AnnotatePeaksFromGTF.
Issue 1:
When I run the function with genome=NULL
I get an error:
Error in
[.data.frame(peak.table, peaks.keep.idx, "exon.intron") : object 'peaks.keep.idx' not found
I looked in the Annotate.R script (master). In line 138 you have annot.df$Junctions <- peak.table[peaks.keep.idx, "exon.intron"]
.
This is where the error is coming from. peaks.keep.idx is in defined in if (!is.null(genome) & isS4(genome))
Is this function created with the intention to be run exclusively with a genome?
Issue 2:
I only encountered the previous issue because there is an issue when I use a genome. The error is the following:
Error in strsplit(coord, split = ":") : non-character argument
This is coming from the Annotate.R script again, from line 561.
This is strange, because the example of the vignette works.
And coord is a "row" from a GRanges object. So in my console strsplit does not work. (could it be I am using a different version of GenomicRanges than you?). Of course this seems to be solved when coord=as.character(gr[i]) (which you have somewhere in comments but not in the actual code).
I would appreciate any fast feedback of how to proceed.
Thanks,
amisios
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.