Comments (22)
I also apologize for the lack of responses from my side. I have been really busy lately.
In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", :
This is likely the source of the error. I suspect, the column names of your file are not correct. You should check that they are identical to the ones that are generated by Arriba.
I plan on making the draw_fusions.R
script compatible with STAR-Fusion and could send you the script. But I probably will not come around to doing this in the next 1-2 weeks.
from arriba.
I already have a prototype of draw_fusions.R that is compatible with STAR-Fusion. I'll push it to GitHub next week probably. Still need to do some testing. Stay tuned.
from arriba.
Hi Krutika,
I just uploaded an enhanced version of draw_fusions.R to the development branch of the repository that should also be compatible with STAR-Fusion output (both with and without FusionInspector extra columns). You can download it from here:
https://raw.githubusercontent.com/suhrig/arriba/develop/draw_fusions.R
Feel free to give it a try if you find the time.
Regards,
Sebastian
from arriba.
Hi Krutika,
Arriba version 2.0.0 has just been released. It natively supports creating visualizations of fusion predictions made by STAR-Fusion. So this issue is finally resolved. Let know if you still have questions. I am closing this thread for now.
Regards,
Sebastian
from arriba.
Assembly is not used to predict the transcribed strand (given after the slash). The transcribed strand is derived from either the strand of the supporting reads (if a stranded library was used) or from splice patterns of the supporting reads (if a non-stranded library was used).
Assembly is used to determine the transcript sequence around the breakpoints. Basically, Arriba performs a pileup of all the reads supporting a fusion and takes the most frequent base for each position.
Does this answer your question?
from arriba.
Thank you that does help.
So to clarify in the draw_fusions.R the script reads the "after the slash" strand information to gather/re-arrange the gene structure (and overlapping domains) to plot the fusion? That visualization looks amazing so I'm trying to understand it better.
from arriba.
draw_fusions.R
could in theory use the strand information after the slash, but it does not need to, because Arriba arranges gene1 and gene2 to reflect the 5' and 3' end of the fusion. Transcription always starts in gene1 and ends in gene2 (with few exceptions where it's completely unclear). draw_fusions.R
orients the fusion gene such that transcription goes from left to right. It does not need to evaluate the transcribed strand, because this information is implicit in the order of gene1 and gene2.
That visualization looks amazing so I'm trying to understand it better.
Thanks. Feel free to ask if you would like to customize some aspects. What may take an hour for you may be a matter of minutes for me. I can either send you a custom patch or - if the request is likely useful to many users - add a new feature.
from arriba.
Oh great! I'm actually trying to visualize STAR-fusion calls the same way arriba visualizes them to have similar plots from both callers which is why I had specific questions as to how the draw_fusion.R script works. https://gist.github.com/kgaonkar6/d2efc3660d31c20217a9071c6244e70a is what I have now to format the files correctly which seems to work to create the similar plots. But if you can send it as a patch to the draw_fusion.R script itself that would be really appreciated. I guess others might find it useful as well..?
from arriba.
Hello @kgaonkar6
I'm trying to use your script to draw Fusions from StarFusion. It seem to work for some fusions but not all. Did you encounter this kind of behavior ?
Best,
from arriba.
Hi @Ephedria , I was able to plot all the fusions from my StarFusion file. However, I did notice if the gene was not found in the gtf file provided it would print out "Error XYZ exon coordinate not found in gtf" on the pdf page.
Also you might want to check the format/colnames of Starfusion, mine was output from STAR-Fusion 1.5.0
Did you get any errors?
from arriba.
Hello Krutika,
I didn't get an error like you, mine stopped the process right away :
Warning message:
In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", :
number of items to replace is not a multiple of replacement length
Loading annotation
Loading protein domains
Drawing fusion #1: CALR:TANC2
Drawing fusion #2: SENP6:MYO6
Drawing fusion #3: IKZF2:ERBB4
Error in if ((codingLength1 == 0 || antisenseTranscription1) && (codingLength2 == :
missing value where TRUE/FALSE needed
Calls: drawProteinDomains
Execution halted
I tried to look at the line that cause the bug, I didn't found anything relevant...
My output is from a FusionInspector with the --examine_coding_effect which is the same as StarFusion.
I didn't thought of this possibility since your script worked fine. Could you send me an exemple of your StarFusion output so I may try to understand what cause the bug ?
Best,
from arriba.
Sorry for the late reply, here you go:
testStarFusion.txt
from arriba.
Thanks for the link, I will try to found out the problem.
from arriba.
Hello
Sorry I forgot to answer ... The header doesn't exactly look like Arriba's but it work on some fusions and not on others fusions.
I don't find out why ...
I will love to see draw_fusions.R working with StarFusion. I hope this is not too much work for you ...
Best,
from arriba.
Ho cool I will be eager to test it !
from arriba.
Thanks for the hard work.
I was testing it with an input of FusionInspector but I didn't seem to work ...
I had this error :
Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains
Can you send me the command used to create your output of StarFusion ? I will try to stick to it.
Best,
Ephe
from arriba.
Warning message:
In parseGtfAttribute("exon_number", exons) :
This error is related to the GTF file and has nothing to do with the fusion file. What GTF are you using? Can you send me a link or upload it somewhere?
Moreover, this is only a warning and in view of the fact that only 558 records are affected (of probably tens of thousands), this should not even be a concern. Is a PDF file generated as output?
from arriba.
Hum quite strange, because I try with an ouput from Arriba and it worked well. I didn't change the gtf file.
I use the : GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf
It does create a PDF file, but it's empty ...
from arriba.
The error about parsing the GTF file should appear regardless of whether you use Arriba or STAR-Fusion. Can you confirm? Please send me the full call to draw_fusions.R
for both Arriba and STAR-Fusion.
When I use CTAT annotation, the GTF parsing error is shown with output from both Arriba and STAR-Fusion. So at least the error is definitely not related to the fusions file. And in both cases a non-empty PDF file is produced. I'm suspecting an issue with the content of your STAR-Fusion file that the script cannot handle. Can you share the content? Or at least the header line and one fusion prediction? Thanks.
from arriba.
I did verified, you were right, I do have the same error when trying with an Arriba output...
Here is the output with Arriba output :
Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains
Drawing fusion #1: RNASEH2B-AS1:RNASEH2B
Drawing fusion #2: AC004943.2:AC116667.2
....
And here is the output with a FusionInspector output :
Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains
/mnt/go-docker/cmd.sh: line 8: 36 Killed /scratch2/tmp/mneou/Scripts/draw_fusions.R --fusions="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS.FusionInspector.fusions.abridged.tsv.coding_effect" --alignments="/scratch2/tmp/mneou/Arriba/MAP634_WTS/STAR/MAP634_WTSAligned.sortedByCoord.out.bam" --output="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS_fusions_CR.pdf" --annotation=/data/annotations/Human/GRCh38/index/star-fusion/v1.8/GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf --cytobands=/database/cytobands_hg38_GRCh38_2018-02-23.tsv --proteinDomains=/database/protein_domains_hg38_GRCh38_2019-07-05.gff3
What annotation are you using ? I will try to use the same one, if it can prevent this warning ?
Sure here it is :
https://github.com/Ephedria/FusionInspector/blob/master/Sample1_FusionInspector.tsv
Thanks again for the help.
Best
from arriba.
/mnt/go-docker/cmd.sh: line 8: 36 Killed
This is the real issue. The script got killed by an external process. The most likely explanation is that it ran out of memory. Typically, the script needs < 2GB. Does the script run inside a job with an artificial memory limit? Another explanation is that your job has a time limit and was killed after reaching that limit. Either way, this is not a bug in the script, but an operating system issue.
What annotation are you using ? I will try to use the same one, if it can prevent this warning ?
I use GENCODE >= v28. The warning is harmless, though. You can continue using your GTF file if you want. All this means is that about 1% of the genes will lack exon labels when they are drawn, i.e., there is no number inside the exon. So it's only cosmetic.
from arriba.
Hoo my bad, you were right I did ran out of memory ...
I did again by expending the memory allocatated it works fine.
I will try to test on different samples, but I pretty sure it will turn out well.
Thanks again for this tool and the help.
Best,
from arriba.
Related Issues (20)
- Interpretation of Arriba fusion output with multiple genes in either 5'- or 3'- partners HOT 1
- Read identifiers not appearing in fusions.discarded.tsv HOT 2
- KANSL1-ARL17A/KANSL1-ARL17B fusion detection HOT 2
- Can this software get the sequence of the fusion gene? HOT 4
- My sever can not use download_references.sh HOT 15
- How to get the exon number from output? HOT 4
- Generating a Rat novergicus library HOT 10
- Utility scripts in Docker container HOT 1
- Missing transcript_id2 in filtered.fusions.tsv file + mismatches with transcript IDs in visualizations HOT 4
- PARAMETERS error: --chimMultimapNmax > 0 (new chimeric detection) presently only works with --chimOutType Junctions HOT 4
- Arriba's STAR parameters: is HardClip in "--chimOutType WithinBAM HardClip" required? HOT 2
- recommanded way to make custom blackfile HOT 2
- Potentially truncated fastq? HOT 1
- Fusion pdf output "No coding regions due to antisense transcription" HOT 2
- Invalid file when converting fusions to vcf HOT 1
- Does Arriba do reads downsampling by default? HOT 8
- std::out_of_range error HOT 18
- no split reads or discordant mates found HOT 7
- Gene annotation HOT 1
- Different gene fusions from the same alignments HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arriba.