GithubHelp home page GithubHelp logo

tobitekath / dturtle Goto Github PK

View Code? Open in Web Editor NEW
16.0 2.0 3.0 12.85 MB

Perform differential transcript usage (DTU) analysis of bulk or single-cell RNA-seq data. See documentation at:

Home Page: https://tobitekath.github.io/DTUrtle

License: GNU General Public License v3.0

R 100.00%
dtu rna-seq rnaseq single-cell isoforms transcriptomics

dturtle's People

Contributors

tobitekath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dturtle's Issues

hint for incomplete output

Hi,
I did everything the bullets show and it worked fine.
I'm only concerned with how to get the transcripts that are significant, not just the list of transcripts, else the list with their respective p-value or q-value by transcript.
Is it possible to know which specific transcripts by gene are significant after run DEG?

Thanks

plot_transcripts_view() error: subscript contains invalid names

Describe the bug
Hello, thank you for the amazing tool. I am running DTUrtle with own transcriptome. All other analysis commands and visualisation worked, apart from plot_transcripts_view().
Error: subscript contains invalid names
The gene ids in the .gtf match the gene ids in the dturtle object, but there are no gene or transcript names in the file.

To Reproduce
gtf <- import_gtf("GTF_PATH", feature_type=NULL, out_df=FALSE)
plot_transcripts_view(dturtle = dturtle,
genes = "ENSG00000151067",
gtf = gtf,
genome = NULL,
one_to_one = TRUE)

Please complete the following information:

  • OS: darwin17.0
  • R-Version: R version 4.2.1 (2022-06-23)
  • DTUrtle Version: 1.0.2

Issues in plot_transcripts_view

Hi,
I have a issue in plotting transcripts view, it works fine when I plot any single gene, but I can't plot it together. I got isoform files from rsem. And I have finished the DTU analysis, and plot the barplot and heatmap, they all works perfect. I'm not sure if is a bug, or there is something wrong with my code. I would appreciate if you can give me some suggestions.

table_UT_sig <- plot_transcripts_view(dturtle = table_UT_sig,
gtf = "gencode.vM25.annotation.gtf",
genome = 'mm10',
one_to_one = TRUE,
savepath = "images_ut_transcript",
add_to_table = "transcript_view")

Importing gtf file from disk.

Performing one to one mapping in gtf

Found gtf GRanges for 27 of 27 provided genes.

Fetching ideogram tracks ...
Creating 27 plots:
iteration: 1
Error: BiocParallel errors
1 remote errors, element index: 1
26 unevaluated and other errors
first remote error: QuartzBitmap_Output - unable to open file 'images_ut_transcript/Acbd5_transcripts.png'

This the traceback
traceback()

4: stop(.error_bplist(res))
3: BiocParallel::bplapply(valid_genes, function(gene) {
gene_gtf <- gtf[gtf@elementMetadata[[gtf_genes_column]] ==
gene, ]
gene_info <- as.data.frame(gene_gtf[gene_gtf$type == "gene",
])
if (tested_transcripts_only %in% c("mixed", TRUE)) {
tested_tx <- dturtle$FDR_table$txID[dturtle$FDR_table$geneID ==
gene & !is.na(dturtle$FDR_table$transcript)]
if (length(tested_tx) == 0) {
if (tested_transcripts_only == "mixed") {
tested_tx <- dturtle$FDR_table$txID[dturtle$FDR_table$geneID ==
gene]
}
else {
message("Skipping ", gene, " --- no transcripts to plot. You may want to adjust the tested_transcripts_only >parameter.")
return()
}
}
}
else {
tested_tx <- gene_gtf@elementMetadata[[gtf_tx_column]][gene_gtf$type ==
"transcript"]
}
gtf_trans <- gene_gtf[gene_gtf@elementMetadata[[gtf_tx_column]] %in%
tested_tx & !gene_gtf$type %in% c("transcript", "gene")]
if (length(gtf_trans) == 0) {
message("Skipping ", gene, " --- no transcripts to plot. You may want to adjust the tested_transcripts_only >parameter.")
return()
}
temp <- Gviz::seqlevels(gtf_trans)
temp[!startsWith(temp, "chr")] <- paste0("chr", temp[!startsWith(temp,
"chr")])
GenomeInfoDb::seqlevels(gtf_trans) <- temp
track_list <- list()
grtrack_list <- c()
if (!is.null(genome)) {
track_list <- append(track_list, ideoTracks[[gene_info$seqnames]])
}
if (reduce_introns) {
reduction_obj <- granges_reduce_introns(gtf_trans, reduce_introns_min_size)
gtf_trans <- reduction_obj$granges
}
else {
track_list <- append(track_list, Gviz::GenomeAxisTrack())
}
tx_ranges <- as.data.frame(GenomicRanges::ranges(gtf_trans))
gtf_trans <- split(gtf_trans, gtf_trans@elementMetadata[[gtf_tx_column]])
grouped_mean_df <- get_diff(gene, dturtle)
grouped_mean_df <- grouped_mean_df[tested_tx, , drop = FALSE]
rownames(grouped_mean_df) <- tested_tx
tested_tx <- rownames(grouped_mean_df)[order(abs(grouped_mean_df$diff),
decreasing = TRUE)]
for (tx_id in tested_tx) {
gtf_tx <- gtf_trans[[tx_id]]
if (all(c("CDS", "UTR") %in% unique(gtf_tx$type))) {
gtf_tx <- gtf_tx[gtf_tx$type != "exon"]
}
grtrack <- Gviz::GeneRegionTrack(gtf_tx, transcript = gtf_tx$transcript_id,
feature = gtf_tx$type, exon = gtf_tx$exon_id, gene = gtf_tx$gene_id,
symbol = gtf_tx$transcript_name, transcriptAnnotation = "symbol",
thinBoxFeature = c("UTR"), col = NULL, name = ifelse(tx_id %in%
dturtle$sig_tx, " Sig.", ""), rotation.title = 0,
background.title = ifelse(tx_id %in% dturtle$sig_tx,
"orangered", "transparent"), cex.group = fontsize_vec[[3]],
cex.title = fontsize_vec[[3]], cex.axis = fontsize_vec[[3]])
tx_fitted_mean <- grouped_mean_df[tx_id, "diff"]
if (!is.na(tx_fitted_mean)) {
anno_text_start <- ggplot2::unit(arrow_start, "npc")
grobs <- grid::grobTree(grid::textGrob(label = ifelse(tx_fitted_mean >
0, intToUtf8(11014), intToUtf8(11015)), name = "arrow",
x = anno_text_start, gp = grid::gpar(fontsize = ceiling(fontsize_vec[[1]] *
1.5), col = ifelse(tx_fitted_mean > 0, arrow_colors[[1]],
arrow_colors[[2]]))), grid::textGrob(label = paste0(" ",
scales::percent(tx_fitted_mean, accuracy = 0.01)),
name = "text", x = 2 * grid::grobWidth("arrow") +
anno_text_start, gp = grid::gpar(fontsize = fontsize_vec[[1]])))
track_annotation <- Gviz::CustomTrack(plottingFunction = function(GdObject,
prepare, ...) {
if (!prepare)
grid::grid.draw(GdObject@variables$grobs)
return(invisible(GdObject))
}, variables = list(grobs = grobs))
overlay <- Gviz::OverlayTrack(trackList = list(grtrack,
track_annotation))
}
else {
overlay <- Gviz::OverlayTrack(trackList = list(grtrack))
}
overlay@dp@pars <- utils::modifyList(overlay@dp@pars,
overlay@trackList[[1]]@dp@pars)
grtrack_list <- c(grtrack_list, overlay)
}
if (reduce_introns) {
new_intron_starts <- GenomicRanges::start(reduction_obj$reduced_regions) -
cumsum(c(0, GenomicRanges::width(reduction_obj$reduced_regions)[-length(reduction_obj$reduced_regions)])) +
cumsum(c(0, reduction_obj$reduced_regions$new_width[-length(reduction_obj$reduced_regions)]))
if (length(new_intron_starts) > 0) {
grtrack_list <- Gviz::HighlightTrack(trackList = grtrack_list,
start = new_intron_starts, width = reduction_obj$reduced_regions$new_width,
chromosome = gtf_trans[[1]]@seqnames@values,
fill = reduce_introns_fill, col = "white", inBackground = TRUE)
}
}
extension_front <- (max(tx_ranges$end) - min(tx_ranges$start)) *
max(nchar(tested_tx)) * extension_factors[[1]]
if (!all(is.na(grouped_mean_df$diff))) {
extension_back <- (max(tx_ranges$end) - min(tx_ranges$start)) *
extension_factors[[2]]
}
else {
extension_back <- 0
}
if (!is.null(savepath)) {
filename <- file.path(savepath, paste0(make.names(gene),
filename_ext))
if (endsWith(filename_ext, ".png")) {
args <- list(width = 900, height = 700, filename = filename,
res = 160)
args <- utils::modifyList(args, list(...))
do.call(grDevices::png, c(args))
}
else if (endsWith(filename_ext, ".pdf")) {
args <- list(filename = filename, width = 9)
args <- utils::modifyList(args, list(...))
if (capabilities("cairo")) {
do.call(grDevices::cairo_pdf, c(args))
}
else {
do.call(grDevices::pdf, c(args))
}
}
else {
args <- list(width = 900, height = 700, filename = filename,
quality = 100, res = 160)
args <- utils::modifyList(args, list(...))
do.call(grDevices::jpeg, c(args))
}
}
base_title <- paste0(gene_info$gene_name, ifelse(include_ID_in_title,
paste0(" (", gene_info$gene_id, ")"), ""))
p <- Gviz::plotTracks(append(track_list, grtrack_list), collapse = TRUE,
from = min(tx_ranges$start), to = max(tx_ranges$end),
extend.left = extension_front, extend.right = extension_back,
title.width = if (any(tested_tx %in% dturtle$sig_tx))
NULL
else 0, main = paste0(base_title, " --- ", levels(dturtle$group)[1],
" vs. ", levels(dturtle$group)[2]), cex.main = fontsize_vec[[2]])
if (!is.null(savepath)) {
grDevices::dev.off()
return(args$filename)
}
else {
return(p)
}
}, BPPARAM = BPPARAM)
2: BiocParallel::bplapply(valid_genes, function(gene) {
gene_gtf <- gtf[gtf@elementMetadata[[gtf_genes_column]] ==
gene, ]
gene_info <- as.data.frame(gene_gtf[gene_gtf$type == "gene",
])
if (tested_transcripts_only %in% c("mixed", TRUE)) {
tested_tx <- dturtle$FDR_table$txID[dturtle$FDR_table$geneID ==
gene & !is.na(dturtle$FDR_table$transcript)]
if (length(tested_tx) == 0) {
if (tested_transcripts_only == "mixed") {
tested_tx <- dturtle$FDR_table$txID[dturtle$FDR_table$geneID ==
gene]
}
else {
message("Skipping ", gene, " --- no transcripts to plot. You may want to adjust the tested_transcripts_only parameter.")
return()
}
}
}
else {
tested_tx <- gene_gtf@elementMetadata[[gtf_tx_column]][gene_gtf$type ==
"transcript"]
}
gtf_trans <- gene_gtf[gene_gtf@elementMetadata[[gtf_tx_column]] %in%
tested_tx & !gene_gtf$type %in% c("transcript", "gene")]
if (length(gtf_trans) == 0) {
message("Skipping ", gene, " --- no transcripts to plot. You may want to adjust the tested_transcripts_only parameter.")
return()
}
temp <- Gviz::seqlevels(gtf_trans)
temp[!startsWith(temp, "chr")] <- paste0("chr", temp[!startsWith(temp,
"chr")])
GenomeInfoDb::seqlevels(gtf_trans) <- temp
track_list <- list()
grtrack_list <- c()
if (!is.null(genome)) {
track_list <- append(track_list, ideoTracks[[gene_info$seqnames]])
}
if (reduce_introns) {
reduction_obj <- granges_reduce_introns(gtf_trans, reduce_introns_min_size)
gtf_trans <- reduction_obj$granges
}
else {
track_list <- append(track_list, Gviz::GenomeAxisTrack())
}
tx_ranges <- as.data.frame(GenomicRanges::ranges(gtf_trans))
gtf_trans <- split(gtf_trans, gtf_trans@elementMetadata[[gtf_tx_column]])
grouped_mean_df <- get_diff(gene, dturtle)
grouped_mean_df <- grouped_mean_df[tested_tx, , drop = FALSE]
rownames(grouped_mean_df) <- tested_tx
tested_tx <- rownames(grouped_mean_df)[order(abs(grouped_mean_df$diff),
decreasing = TRUE)]
for (tx_id in tested_tx) {
gtf_tx <- gtf_trans[[tx_id]]
if (all(c("CDS", "UTR") %in% unique(gtf_tx$type))) {
gtf_tx <- gtf_tx[gtf_tx$type != "exon"]
}
grtrack <- Gviz::GeneRegionTrack(gtf_tx, transcript = gtf_tx$transcript_id,
feature = gtf_tx$type, exon = gtf_tx$exon_id, gene = gtf_tx$gene_id,
symbol = gtf_tx$transcript_name, transcriptAnnotation = "symbol",
thinBoxFeature = c("UTR"), col = NULL, name = ifelse(tx_id %in%
dturtle$sig_tx, " Sig.", ""), rotation.title = 0,
background.title = ifelse(tx_id %in% dturtle$sig_tx,
"orangered", "transparent"), cex.group = fontsize_vec[[3]],
cex.title = fontsize_vec[[3]], cex.axis = fontsize_vec[[3]])
tx_fitted_mean <- grouped_mean_df[tx_id, "diff"]
if (!is.na(tx_fitted_mean)) {
anno_text_start <- ggplot2::unit(arrow_start, "npc")
grobs <- grid::grobTree(grid::textGrob(label = ifelse(tx_fitted_mean >
0, intToUtf8(11014), intToUtf8(11015)), name = "arrow",
x = anno_text_start, gp = grid::gpar(fontsize = ceiling(fontsize_vec[[1]] *
1.5), col = ifelse(tx_fitted_mean > 0, arrow_colors[[1]],
arrow_colors[[2]]))), grid::textGrob(label = paste0(" ",
scales::percent(tx_fitted_mean, accuracy = 0.01)),
name = "text", x = 2 * grid::grobWidth("arrow") +
anno_text_start, gp = grid::gpar(fontsize = fontsize_vec[[1]])))
track_annotation <- Gviz::CustomTrack(plottingFunction = function(GdObject,
prepare, ...) {
if (!prepare)
grid::grid.draw(GdObject@variables$grobs)
return(invisible(GdObject))
}, variables = list(grobs = grobs))
overlay <- Gviz::OverlayTrack(trackList = list(grtrack,
track_annotation))
}
else {
overlay <- Gviz::OverlayTrack(trackList = list(grtrack))
}
overlay@dp@pars <- utils::modifyList(overlay@dp@pars,
overlay@trackList[[1]]@dp@pars)
grtrack_list <- c(grtrack_list, overlay)
}
if (reduce_introns) {
new_intron_starts <- GenomicRanges::start(reduction_obj$reduced_regions) -
cumsum(c(0, GenomicRanges::width(reduction_obj$reduced_regions)[-length(reduction_obj$reduced_regions)])) +
cumsum(c(0, reduction_obj$reduced_regions$new_width[-length(reduction_obj$reduced_regions)]))
if (length(new_intron_starts) > 0) {
grtrack_list <- Gviz::HighlightTrack(trackList = grtrack_list,
start = new_intron_starts, width = reduction_obj$reduced_regions$new_width,
chromosome = gtf_trans[[1]]@seqnames@values,
fill = reduce_introns_fill, col = "white", inBackground = TRUE)
}
}
extension_front <- (max(tx_ranges$end) - min(tx_ranges$start)) *
max(nchar(tested_tx)) * extension_factors[[1]]
if (!all(is.na(grouped_mean_df$diff))) {
extension_back <- (max(tx_ranges$end) - min(tx_ranges$start)) *
extension_factors[[2]]
}
else {
extension_back <- 0
}
if (!is.null(savepath)) {
filename <- file.path(savepath, paste0(make.names(gene),
filename_ext))
if (endsWith(filename_ext, ".png")) {
args <- list(width = 900, height = 700, filename = filename,
res = 160)
args <- utils::modifyList(args, list(...))
do.call(grDevices::png, c(args))
}
else if (endsWith(filename_ext, ".pdf")) {
args <- list(filename = filename, width = 9)
args <- utils::modifyList(args, list(...))
if (capabilities("cairo")) {
do.call(grDevices::cairo_pdf, c(args))
}
else {
do.call(grDevices::pdf, c(args))
}
}
else {
args <- list(width = 900, height = 700, filename = filename,
quality = 100, res = 160)
args <- utils::modifyList(args, list(...))
do.call(grDevices::jpeg, c(args))
}
}
base_title <- paste0(gene_info$gene_name, ifelse(include_ID_in_title,
paste0(" (", gene_info$gene_id, ")"), ""))
p <- Gviz::plotTracks(append(track_list, grtrack_list), collapse = TRUE,
from = min(tx_ranges$start), to = max(tx_ranges$end),
extend.left = extension_front, extend.right = extension_back,
title.width = if (any(tested_tx %in% dturtle$sig_tx))
NULL
else 0, main = paste0(base_title, " --- ", levels(dturtle$group)[1],
" vs. ", levels(dturtle$group)[2]), cex.main = fontsize_vec[[2]])
if (!is.null(savepath)) {
grDevices::dev.off()
return(args$filename)
}
else {
return(p)
}
}, BPPARAM = BPPARAM)
1: plot_transcripts_view(dturtle = table_UT_sig, gtf = "gencode.vM25.annotation.gtf",
genome = "mm10", one_to_one = TRUE, savepath = "images_ut_transcript",
add_to_table = "transcript_view")

For specific gene, it works fine.
plot_transcripts_view(dturtle = table_UT_sig,
genes = "Gm14461",
gtf = "gencode.vM25.annotation.gtf",
genome = 'mm10',
one_to_one = TRUE)

Importing gtf file from disk.

Performing one to one mapping in gtf

Found gtf GRanges for 1 of 1 provided genes.

Fetching ideogram tracks ...
Creating 1 plots:
$Gm14461
$Gm14461$chr2
Ideogram track 'chr2' for chromosome 2 of the mm10 genome

$Gm14461$OverlayTrack
OverlayTrack 'OverlayTrack' containing 2 subtracks

$Gm14461$OverlayTrack
OverlayTrack 'OverlayTrack' containing 2 subtracks

$Gm14461$titles
An object of class "ImageMap"
Slot "coords":
x1 y1 x2 y2
chr2 6 55.20000 25.55391 80.28973
OverlayTrack 6 80.28973 25.55391 289.14487
OverlayTrack 6 289.14487 25.55391 498.00000

Slot "tags":
$title
chr2 OverlayTrack OverlayTrack
"chr2" "OverlayTrack" "OverlayTrack"

To Reproduce
Steps to reproduce the behavior:

#prepare gtf files
tx2gene <- import_gtf(gtf_file = "gencode.vM25.annotation.gtf")
tx2gene$gene_name <- one_to_one_mapping(name = tx2gene$gene_name, id = tx2gene$gene_id)
#> Changed 110 names.
tx2gene$transcript_name <- one_to_one_mapping(name = tx2gene$transcript_name, id = tx2gene$transcript_id)
tx2gene <- move_columns_to_front(df = tx2gene, columns = c("transcript_name", "gene_name"))
#import files
list.files("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/")
files <- Sys.glob("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/*isoforms.results")
files
names(files) <- gsub("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/","",files)
data <- import_counts(files, type = "rsem", tx2gene=tx2gene[,c("transcript_id", "gene_name")])
rownames(data) <- tx2gene$transcript_name[match(rownames(data), tx2gene$transcript_id)]
dim(data) #142604 8
head(data)
colnames(data) <- gsub(".isoforms.results","",colnames(data))
pd <- data.frame("id"=colnames(data),
"group"=c(rep("Ctr_LPS",4), rep("Ctr_Ut",4),rep("Mut_LPS",4),rep("Mut_Ut",4)),
stringsAsFactors = FALSE)
pd

#DTU analysis
dturtle_UT <- run_drimseq(counts = data, tx2gene = tx2gene, pd=pd, id_col = "id",
cond_col = "group", cond_levels = c("Mut_Ut","Ctr_Ut"), filtering_strategy = "bulk")
#Retain 28937 of 73504 features.
#Removed 44567 features.
head(dturtle_UT$meta_table_gene)
dturtle_UT$used_filtering_options

dturtle_UT_sig <- posthoc_and_stager(dturtle = dturtle_UT, ofdr = 0.05, posthoc = 0.1)
#Found 27 significant genes with 22 significant transcripts (OFDR: 0.05)
table_UT_sig <- create_dtu_table(dturtle = dturtle_UT_sig, add_gene_metadata = list("chromosome"="seqnames"),
add_tx_metadata = list("tx_expr_in_max" = c("exp_in", max)))
dim(table_UT_sig$dtu_table)

table_UT_sig <- plot_proportion_barplot(dturtle = table_UT_sig,
meta_gene_id = "gene_id.1",
savepath = "images_ut_barplot",
add_to_table = "barplot")

head(plot_UT_sig$dtu_table$barplot)
head(list.files("./images/"))

table_UT_sig <- plot_proportion_pheatmap(dturtle = table_UT_sig,
include_expression = TRUE,
treeheight_col=20,
savepath = "images_ut_heatmap",
add_to_table = "pheatmap")

Please complete the following information:
-R version 4.1.2 (2021-11-01)
-Platform: x86_64-apple-darwin17.0 (64-bit)
-Running under: macOS Catalina 10.15.7
-DTUrtle_1.0.1

Issues in plot_transcripts_view

Hi,
I'm sorry to disturb you again. Thanks for your help completing the transcript plot. But I find there is no arrow for pointing out the direction of change in my transcript plot. Could you please help me fix the bug?

table_LPS_sig <- plot_transcripts_view(dturtle = table_LPS_sig,
gtf = "gencode.vM25.annotation.gtf",
genome = 'mm10',
one_to_one = TRUE,
savepath = "images_LPS_transcript",
add_to_table = "transcript_view")

Importing gtf file from disk.
Performing one to one mapping in gtf
Found gtf GRanges for 13 of 13 provided genes.
Fetching ideogram tracks ...
Creating 13 plots:
iteration: 1

Srsf5_transcripts_circle

This the traceback
traceback()

3: h(simpleError(msg, call))
2: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "object 'plot_LPS_sig' not found",
base::quote(head(plot_LPS_sig$dtu_table$barplot)))
1: head(plot_LPS_sig$dtu_table$barplot)

To Reproduce
Steps to reproduce the behavior:
library(data.table)
#install.packages("apeglm")
library(apeglm)
library(DTUrtle)
setwd("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem")
library(DESeq2)
######DTU############
#prepare gtf files
tx2gene <- import_gtf(gtf_file = "gencode.vM25.annotation.gtf")
tx2gene$gene_name <- one_to_one_mapping(name = tx2gene$gene_name, id = tx2gene$gene_id)
#> Changed 110 names.
tx2gene$transcript_name <- one_to_one_mapping(name = tx2gene$transcript_name, id = tx2gene$transcript_id)
tx2gene <- move_columns_to_front(df = tx2gene, columns = c("transcript_name", "gene_name"))
#import files
list.files("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/")
files <- Sys.glob("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/*isoforms.results")
files
names(files) <- gsub("/Users/LiJin/Documents/nBox/Lijin/Manvendra/rsem/","",files)
data <- import_counts(files, type = "rsem", tx2gene=tx2gene[,c("transcript_id", "gene_name")])
rownames(data) <- tx2gene$transcript_name[match(rownames(data), tx2gene$transcript_id)]
dim(data) #142604 8
head(data)
colnames(data) <- gsub(".isoforms.results","",colnames(data))
pd <- data.frame("id"=colnames(data),
"group"=c(rep("Ctr_LPS",4), rep("Ctr_Ut",4),rep("Mut_LPS",4),rep("Mut_Ut",4)),
stringsAsFactors = FALSE)
pd
dturtle_LPS <- run_drimseq(counts = data, tx2gene = tx2gene, pd=pd, id_col = "id",
cond_col = "group", cond_levels = c("Mut_LPS","Ctr_LPS"), filtering_strategy = "bulk")
#Retain 28420 of 71996 features.
#Removed 43576 features.
head(dturtle_LPS$meta_table_gene)
dturtle_LPS$used_filtering_options

dturtle_LPS_sig <- posthoc_and_stager(dturtle = dturtle_LPS, ofdr = 0.05, posthoc = 0.1)
#Posthoc filtered 15787 features.
#Found 13 significant genes with 13 significant transcripts (OFDR: 0.05)
table_LPS_sig <- create_dtu_table(dturtle = dturtle_LPS_sig, add_gene_metadata = list("chromosome"="seqnames"),
add_tx_metadata = list("tx_expr_in_max" = c("exp_in", max)))
dim(table_LPS_sig$dtu_table)
#13 8
table_LPS_sig <- plot_proportion_barplot(dturtle = table_LPS_sig,
meta_gene_id = "gene_id.1",
savepath = "images_LPS_barplot",
add_to_table = "barplot")

head(plot_LPS_sig$dtu_table$barplot)
head(list.files("./images/"))

table_LPS_sig <- plot_proportion_pheatmap(dturtle = table_LPS_sig,
include_expression = TRUE,
treeheight_col=20,
savepath = "images_LPS_heatmap",
add_to_table = "pheatmap")

table_LPS_sig <- plot_transcripts_view(dturtle = table_LPS_sig,
gtf = "gencode.vM25.annotation.gtf",
genome = 'mm10',
one_to_one = TRUE,
savepath = "images_LPS_transcript",
add_to_table = "transcript_view")

Please complete the following information:
-R version 4.1.2 (2021-11-01)
-Platform: x86_64-apple-darwin17.0 (64-bit)
-Running under: macOS Catalina 10.15.7
-DTUrtle_1.0.2

matrix of transcript counts

Hello,
Is it possible to import a matrix of transcript counts as txt or csv file instead of raw data?

Thanks
Yoav

Df is not dataframe error

Dturtle <~ run_drimseq ()

gives an error

it works fine till removing features and then gives an error : df is not a dataframe

Pls help me with this . TIA

run_drimseq stops with some datasets

Describe the bug
Hi, thanks for your contribution. It is a very useful and documented package.
Unfortunately, running the DTU analysis on some datasets it gives error:
Error in constrOptim(theta = prop_init[-q], f = dm_likG, grad = dm_scoreG, : initial value is not in the interior of the feasible region.
In some cases, removing some samples and/or some transcripts solves the issue but I couldn't understand exactly the rationale.
For my understanding, constrOptim function would work adding a value to the theta parameter, but I am not perfectly sure where to add it.
Any hint?
Thanks
To Reproduce

dturtle <- run_drimseq(counts = cts_sub, tx2gene = tx2gene, pd=pd, id_col = "sample_file",
                    cond_col = "cond", cond_levels = c("cond1", "cond2"),
                    filtering_strategy = "bulk", BPPARAM = biocpar)

Adding pseudovalue=T still gives error.

Please complete the following information:

  • OS: Ubuntu 22.04 LTS
  • R-Version 4.2 and 4.3
  • DTUrtle Version 1.0.2

plot_proportion_barplot

Hello, when running the plot_proportion_barplot function I keep getting the following error:
Creating 166 plots:
|==============================================================================| 100%

Error: BiocParallel errors
4 remote errors, element index: 1, 43, 85, 127
162 unevaluated and other errors
first remote error:
Error in xtfrm.data.frame(x): cannot xtfrm data frames

Any idea on how to solve it?

R version 4.3.1 (2023-06-16)
DTUrtle_1.0.2

No common cellnames in Seurat and transcript level counts

Hi, I am using the following code and consistently getting the same error:

library(DTUrtle)
biocpar <- BiocParallel::MulticoreParam(10)

tx2gene <- import_gtf(gtf_file = "/Users/rbronste/Downloads/gencode.vM24.annotation.gtf")

head(tx2gene, n=3)

tx2gene$gene_name <- one_to_one_mapping(name = tx2gene$gene_name, id = tx2gene$gene_id)

tx2gene$transcript_name <- one_to_one_mapping(name = tx2gene$transcript_name, id = tx2gene$transcript_id)

tx2gene <- move_columns_to_front(df = tx2gene, columns = c("transcript_name", "gene_name"))

head(tx2gene, n=5)

list.files("/Users/rbronste/Documents/DTUrtle/alevin")

files <- Sys.glob("/Users/rbronste/Documents/DTUrtle/alevin/alevin_output_*/alevin/quants_mat.gz")
names(files) <- basename(gsub("/alevin/quants_mat.gz", "", files))

cts_list <- import_counts(files = files, type = "alevin", tx2gene = tx2gene)

cts <- combine_to_matrix(tx_list = cts_list, cell_extension_side = "prepend")

dim(cts)

tiss <- combine_to_matrix(tx_list = cts_list, seurat_obj = Data.combined.RNA.sct,cell_extensions = c("WT_RNA", "KO_RNA"), tx2gene = tx2gene, cell_extension_side = "prepend")
Found overall duplicated cellnames. Trying cellname extension per sample.
Map extensions:
	alevin_output_sc151 --> 'WT_RNA_'
	alevin_output_sc161 --> 'KO_RNA_'
	
Merging matrices
Error: No common cellnames in Seurat and transcript level counts. Did you try specifyin a cellname extension?

Not quite sure why its not seeing common names. Thanks.

Allow import_counts to handle missing transcripts

Hello,
Thank you for creating DTUrtle, I have found it very useful and approachable to use. However, I was wondering if the following issue that I commonly run into can be resolved.
In brief: tximport can natively handle missing transcripts from the tx2gene file when importing Salmon files, e.g.,

reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 
transcripts missing from tx2gene: 262
summarizing abundance
summarizing counts
summarizing length

However, import_counts cannot complete and throws an error for this use-case:

Reading in 12 salmon runs.
Using 'dtuScaledTPM' for 'countsFromAbundance'.
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 
Error in medianLengthOverIsoform(length4CFA, tx2gene, ignoreTxVersion,  : 
  all(txId %in% tx2gene$tx) is not TRUE

Could this be resolved so that import_counts completes given incomplete tx2gene inputs, or added as an additional argument to import_counts?

Please let me know if you need any additional information or have any questions.

Thanks a lot,
Kyle

Is your feature request related to a problem? Please describe.
The problem is with the "import_counts" function throwing an error:

Error in medianLengthOverIsoform(length4CFA, tx2gene, ignoreTxVersion,  : 
  all(txId %in% tx2gene$tx) is not TRUE

When tx2gene does not contain all of the transcripts contained in the Salmon quantification files.

Describe the solution you'd like
An argument added to "import_counts" which would enable this error to be a warning instead.

Describe alternatives you've considered
I could re-run Salmon by removing the missing transcripts from the index, or remove the missing transcripts from each Salmon output file.

combine_to_matrix error

Describe the bug
Hello, thanks a lot for the very useful tool. I am trying it for the first time and I am starting from the already analyzed Seurat object.
I am importing and transforming the gtf file for the annotation with:
tx2gene <- import_gtf(gtf_file = "Mus_musculus.GRCm39.109.gtf.gz")
tx2gene$gene_name <- one_to_one_mapping(name = tx2gene$gene_name, id = tx2gene$gene_id)
tx2gene$transcript_name <- one_to_one_mapping(name = tx2gene$transcript_name, id = tx2gene$transcript_id)
tx2gene <- move_columns_to_front(df = tx2gene, columns = c("transcript_name", "gene_name"))

Then I run:
cts<-seur_obj_sub@assays$RNA$counts

seur_obj_sub <- combine_to_matrix(tx_list = cts ,seurat_obj = seur_obj_sub, tx2gene = tx2gene)
**since I have a Seurat object what is the tx_list input I schould use?

Error:
Trying to infer cell extensions from Seurat object
Error in combine_to_matrix(tx_list = cts, seurat_obj = seur_obj_sub, tx2gene = tx2gene) :
Could not 1:1 map inferred seurat cellname extensions and tx file list.
Either provide explicit cell extensions or try subsetting the seurat object, if you do not want to provide tx information for all samples.

Thank you in advance for yout time!

barplot question

Hello,

Something on my barplot seems off, like there are lots of thin, spread out lines. Is this okay?

Also, where can I find the "proportions" (mean proportion fit per subgroup (red line value) information used for this plotting?

plot_zoom_png
plot_zoom_png-2

How to run using Kallisto outputs?

Hi,

My colleagues and I are having some trouble running DTUrtle with counts from Kallisto. We've tried adapting the presented code to work with a kallisto output from the original salmon format, however we can't seem to get it. Could you provide some sample code on how to import and format Kallisto data to work with this tool?

Thank you,

gene-level DGE columns in dtu_table output

Hi there,

I'm interested in including DGE results (as stored in dturtle$dge_analysis$results_sig) in the dtu_table object alongside the results of the DTU analysis. Is this possible?

Thanks!

Map transcripts to gene clusters

Hello, I'm using the KB output and then Kallisto-quant-TCC to quantify my transcripts. I would like to plot the gene clusters and overlay how the transcripts of interests are expressed across all the clusters. Or even a bar, plot that would quantify the expression of my transcript of interest across various cluster. How to approach this using the DTUrtle package ?
I tried using the Seurat package, but unable to identify the clusters based on the transcript ID. If these are gene IDs, it is easy to map and name the clusters. If I convert the transcript ID to gene names, it results in duplicate rownames. Any suggestion is appreciated

combine to matrix error

Describe the bug
Hello, thanks a lot for the very useful tool. I am trying it for the first time and I am starting from the already analyzed Seurat object.
I am importing and transforming the gtf file for the annotation with:
tx2gene <- import_gtf(gtf_file = "Mus_musculus.GRCm39.109.gtf.gz")
tx2gene$gene_name <- one_to_one_mapping(name = tx2gene$gene_name, id = tx2gene$gene_id)
tx2gene$transcript_name <- one_to_one_mapping(name = tx2gene$transcript_name, id = tx2gene$transcript_id)
tx2gene <- move_columns_to_front(df = tx2gene, columns = c("transcript_name", "gene_name"))

Then I run:
cts<-seur_obj_sub@assays$RNA$counts

seur_obj_sub <- combine_to_matrix(tx_list = cts ,seurat_obj = seur_obj_sub, tx2gene = tx2gene)
**since I have a Seurat object what is the tx_list input I schould use?

Error:
Trying to infer cell extensions from Seurat object
Error in combine_to_matrix(tx_list = cts, seurat_obj = seur_obj_sub, tx2gene = tx2gene) :
Could not 1:1 map inferred seurat cellname extensions and tx file list.
Either provide explicit cell extensions or try subsetting the seurat object, if you do not want to provide tx information for all samples.

Thank you in advance for yout time!

import_gtf doesn't work

Describe the bug
A clear and concise description of what the bug is.
Hi, no matter how I try, I cant seem to load my gtf file to gtf.
To Reproduce
Steps to reproduce the behavior:
gtf <- import_gtf('./oy.gff3')

The object gtf actually gets created, however there is no data in the table.
Please complete the following information:

  • OS: MacOS Monterey 12.6
  • R-Version 4.2.1
  • DTUrtle Version 1.0.2

Thanks!

Further help needed with transcript mapping fonts

Hi - I'm still having a problem in which I get boxes instead of arrows when plotting the transcripts view, per issue #3. I'm on mac OS 12.6.2, R 4.2.2, DTUrtle 1.0.2

I used showtext to import Apple Symbol:

font_add(family = "Apple_Symbol", regular = "/System/Library/Fonts/Apple Symbols.ttf") showtext_auto()
Then the lines you suggested in the earlier issue work perfectly:
par(family = "Apple_Symbol") plot(1:5, t = "n") text(2,2,c(intToUtf8(c(11014))),cex=3) text(4,4,c(intToUtf8(c(11015))),cex=3)
To produce:
AppleSymbolTest

But still when I then try it for the transcript view plotting, I still get the boxes:
plot_transcripts_view(dturtle = dturtle, genes = "HEY1", gtf = "../gencode.v34.annotation.gtf", genome = 'hg38', one_to_one = TRUE, arrow_colors = c("#7CAE00", "#D30000"), family="Apple_Symbol")
Transcript_view_eg

Any further help appreciated!

Hi @lijinw ,

so we definitively found the culprit. To fix the issue, you must find a font supporting at least these two glyphs.

You could have a look at the extrafont or showtext packages - these shall allow you to import additional fonts.

Again, you might try the "Apple Symbols" font that should be pre-installed on your system. At least according to Wikipedia it should support the needed Glyphs (see here).

Originally posted by @TobiTekath in #3 (comment)

does # of cells matter when determining DTU between 2 clusters?

Hello, I was curious if the number of cells used in a comparison matter when performing the DTU analysis. For example, I compared a cluster of ~1000 cells to a cluster of ~100 cells and got many more significant genes and transcripts (>300) than when I compared a cluster of ~1000 cells to a cluster of ~900 cells (<50 signifiant genes/transcripts). I can't tell if it's the biology or the statistics that is causing this. Thanks for any insight you can provide!

Conversion error for arrows in transcript view

Hi, been testing DTUrtle with some bulk RNA-seq data, really like it so far, but I'm coming across an error in generating the transcript views - it may be a ggplot error but not sure how to resolve it

Describe the bug
In the transcript views, the arrows next to the percentages for each transcript are replaced by a box in the png files, and three dots if I try to export as PDF from the plot window (see attached)
HEY1_transcripts
HEY1.pdf
. Warning message is:
Warning messages:
1: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
2: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
3: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for <87>
4: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
5: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
6: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for <87>
7: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
8: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for
9: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬇' in 'mbcsToSbcs': dot substituted for <87>
10: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
11: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
12: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for <86>
13: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
14: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
15: In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for <86>
16: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
17: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for
18: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
conversion failure on '⬆' in 'mbcsToSbcs': dot substituted for <86>

To Reproduce
Standard steps as in the vignette
plot_transcripts_view(dturtle = dturtle, genes = "HEY1", gtf = "../gencode.v34.annotation.gtf", genome = 'hg38', one_to_one = TRUE)

Please complete the following information:

  • OS: 12.6.2
  • R-Version: 4.2.2
  • DTUrtle Version: 1.0.2

Any help appreciated! Thanks

plot transcript view table

Hi there,
is there a way to get, after having run plot_transcripts_view() on my dturtle object, a table with all the values of the mean fitted proportional changes per transcript per gene?
Thanks a lot.

How to import .mtx files?

Hello,
thanks for the great tool! I am having trouble using the type of matrix I have since the method I am using for single cell is a bit different.
The alignment is done with STAR and I have as an output a DGE.mtx file, which I use in Seurat analysis. Is it possible to use it as an input in your package?.

I have done the Seurat analysis and used the section in your vignette which says starting from the Seurat object but the combine_to_matrix should be done if I understood correctly.

Do you have a suggestion on how to solve this?

Many thanks,
Vasiliki

Transcripts matrix

Hello again,
I opened a new issue as a followup to: #6 (comment)

In my csv file, I have barcodes as columns and transcripts as rows:

TranscriptID,CCGCAGTCAGGACTCA,TCATACCACTTGAGTT,TGCGCGTTAATTACCA,AAGTAGAGTACTAAGT
Transcript1,0,0,0,0
Transcript2,0,0,0,0
Transcript3,0,0,3.93,0

I can import it to R using read.csv("transcript_matrix.csv"); however, I am not sure what is the proper structure of the matrix and what you mean by: "specify the columns with the data manually."

Any help will be appreciated.
Thanks
Yoav.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.