I am trying to calculate the peak matrix from scRNA-seq data from human tumor. The vignettes goes well. But I get some problems when annotating peak. Reference genome and gtf file was downloaded from 10x Genomics. Junction file was calculated follow the vignettes, except "-s was setted to 0", since it is a requirement. Enclosed error information and codes.
Looking forward to your suggestion.
> AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
+ gtf.file = reference.file,
+ output.file = "TIP_merged_peak_annotations.txt",
+ genome = genome)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
[1] "Annotating 65378 peak coordinates."
Annotating 3' UTRs
Annotating 5' UTRs
Annotating introns
Annotating exons
Annotating CDSError in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'getSeq' for signature '"character"'
In addition: Warning messages:
1: In for (i in seq_len(n)) { :
closing unused connection 4 (/home/ot/R/x86_64-pc-linux-gnu-library/3.6/Sierra/extdatagenes.gtf)
2: In for (i in seq_len(n)) { :
closing unused connection 3 (/home/ot/R/x86_64-pc-linux-gnu-library/3.6/Sierra/extdatagenes.gtf)
3: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
4: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
5: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
6: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
7: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
8: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
9: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
10: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': M
- in 'y': MT, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
Make sure to always combine/compare objects based on the same reference
genome (use suppressWarnings() to suppress this warning).
> library(Sierra)
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_HK.UTF-8 LC_NUMERIC=C LC_TIME=en_HK.UTF-8
[4] LC_COLLATE=en_HK.UTF-8 LC_MONETARY=en_HK.UTF-8 LC_MESSAGES=en_HK.UTF-8
[7] LC_PAPER=en_HK.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.1 S4Vectors_0.24.1
[5] BiocGenerics_0.32.0 Sierra_0.1.0
loaded via a namespace (and not attached):
[1] ProtGenerics_1.18.0 bitops_1.0-6 matrixStats_0.55.0
[4] flock_0.7 bit64_0.9-7 doParallel_1.0.15
[7] RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.1
[10] tools_3.6.1 backports_1.1.5 R6_2.4.1
[13] rpart_4.1-15 Hmisc_4.3-0 DBI_1.0.0
[16] lazyeval_0.2.2 Gviz_1.30.0 colorspace_1.4-1
[19] nnet_7.3-12 tidyselect_0.2.5 gridExtra_2.3
[22] prettyunits_1.0.2 bit_1.1-14 curl_4.3
[25] compiler_3.6.1 Biobase_2.46.0 htmlTable_1.13.2
[28] DelayedArray_0.12.0 rtracklayer_1.46.0 scales_1.1.0
[31] checkmate_1.9.4 askpass_1.1 rappdirs_0.3.1
[34] stringr_1.4.0 digest_0.6.23 Rsamtools_2.2.1
[37] foreign_0.8-72 XVector_0.26.0 base64enc_0.1-3
[40] dichromat_2.0-0 pkgconfig_2.0.3 htmltools_0.4.0
[43] ensembldb_2.10.2 BSgenome_1.54.0 dbplyr_1.4.2
[46] htmlwidgets_1.5.1 rlang_0.4.2 rstudioapi_0.10
[49] RSQLite_2.1.3 BiocParallel_1.20.0 acepack_1.4.1
[52] dplyr_0.8.3 VariantAnnotation_1.32.0 RCurl_1.95-4.12
[55] magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3
[58] Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0
[61] lifecycle_0.1.0 stringi_1.4.3 SummarizedExperiment_1.16.0
[64] zlibbioc_1.32.0 plyr_1.8.4 BiocFileCache_1.10.2
[67] grid_3.6.1 blob_1.2.0 crayon_1.3.4
[70] lattice_0.20-38 Biostrings_2.54.0 splines_3.6.1
[73] GenomicFeatures_1.38.0 hms_0.5.2 zeallot_0.1.0
[76] knitr_1.26 pillar_1.4.2 reshape2_1.4.3
[79] codetools_0.2-16 biomaRt_2.42.0 XML_3.98-1.20
[82] glue_1.3.1 biovizBase_1.34.0 latticeExtra_0.6-28
[85] BiocManager_1.30.10 data.table_1.12.6 foreach_1.4.7
[88] vctrs_0.2.0 gtable_0.3.0 openssl_1.4.1
[91] purrr_0.3.3 assertthat_0.2.1 ggplot2_3.2.1
[94] xfun_0.11 AnnotationFilter_1.10.0 survival_2.44-1.1
[97] SingleCellExperiment_1.8.0 tibble_2.1.3 iterators_1.0.12
[100] GenomicAlignments_1.22.1 AnnotationDbi_1.48.0 memoise_1.1.0
[103] cluster_2.1.0
library(Sierra)
peak.output.file <- c("Vignette_example_TIP_sham_peaks.txt",
"Vignette_example_TIP_MI_peaks.txt")
FindPeaks(output.file = peak.output.file[2], # output filename
gtf.file = "genes.gtf", # gene model as a GTF file
bamfile = "./L07/possorted_genome_bam.bam", # BAM alignment filename.
junctions.file = "./L07/L07_junction.bed", # BED filename of splice junctions exising in BAM file.
ncores = 4) # number of cores to use
FindPeaks(output.file = peak.output.file[1], # output filename
gtf.file = "genes.gtf", # gene model as a GTF file
bamfile = "./L28/possorted_genome_bam.bam", # BAM alignment filename.
junctions.file = "./L28/L28_junction.bed", # BED filename of splice junctions exising in BAM file.
ncores = 4)
### Merge data
### Read in the tables, extract the peak names and run merging ###
peak.dataset.table = data.frame(Peak_file = peak.output.file,
Identifier = c("TIP-example-Sham", "TIP-example-MI"),
stringsAsFactors = FALSE)
peak.merge.output.file = "TIP_merged_peaks.txt"
MergePeakCoordinates(peak.dataset.table, output.file = peak.merge.output.file, ncores = 1)
### Count Peak
count.dirs <- c("example_TIP_sham_counts", "example_TIP_MI_counts")
#sham data set
CountPeaks(peak.sites.file = peak.merge.output.file,
gtf.file = "genes.gtf",
bamfile = "./L28/possorted_genome_bam.bam",
whitelist.file = "./L28/barcodes.tsv.gz",
output.dir = count.dirs[1],
countUMI = TRUE,
ncores = 4)
# MI data set
CountPeaks(peak.sites.file = peak.merge.output.file,
gtf.file = "genes.gtf",
bamfile = "./L07/possorted_genome_bam.bam",
whitelist.file = "./L07/barcodes.tsv.gz",
output.dir = count.dirs[2],
countUMI = TRUE,
ncores = 4)
### Integration
peak.merge.output.file <- "TIP_merged_peaks.txt"
count.dirs <- c("example_TIP_sham_counts", "example_TIP_MI_counts")
# New definition
out.dir <- "example_TIP_aggregate"
# Now aggregate the counts for both sham and MI treatments
AggregatePeakCounts(peak.sites.file = peak.merge.output.file,
count.dirs = count.dirs,
exp.labels = c("Sham", "MI"),
output.dir = out.dir)
### Annotation peak
# As previously defined
peak.merge.output.file <- "TIP_merged_peaks.txt"
reference.file <- "genes.gtf"
# New definitions
genome <- "/media/ot/Data/Nelson/refdata-cellranger-GRCh38-3.0.0/fasta"
AnnotatePeaksFromGTF(peak.sites.file = peak.merge.output.file,
gtf.file = reference.file,
output.file = "TIP_merged_peak_annotations.txt",
genome = genome)