GithubHelp home page GithubHelp logo

fusion strand about arriba HOT 22 CLOSED

suhrig avatar suhrig commented on June 15, 2024
fusion strand

from arriba.

Comments (22)

suhrig avatar suhrig commented on June 15, 2024 2

I also apologize for the lack of responses from my side. I have been really busy lately.

In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", :

This is likely the source of the error. I suspect, the column names of your file are not correct. You should check that they are identical to the ones that are generated by Arriba.

I plan on making the draw_fusions.R script compatible with STAR-Fusion and could send you the script. But I probably will not come around to doing this in the next 1-2 weeks.

from arriba.

suhrig avatar suhrig commented on June 15, 2024 2

I already have a prototype of draw_fusions.R that is compatible with STAR-Fusion. I'll push it to GitHub next week probably. Still need to do some testing. Stay tuned.

from arriba.

suhrig avatar suhrig commented on June 15, 2024 1

Hi Krutika,

I just uploaded an enhanced version of draw_fusions.R to the development branch of the repository that should also be compatible with STAR-Fusion output (both with and without FusionInspector extra columns). You can download it from here:

https://raw.githubusercontent.com/suhrig/arriba/develop/draw_fusions.R

Feel free to give it a try if you find the time.

Regards,
Sebastian

from arriba.

suhrig avatar suhrig commented on June 15, 2024 1

Hi Krutika,

Arriba version 2.0.0 has just been released. It natively supports creating visualizations of fusion predictions made by STAR-Fusion. So this issue is finally resolved. Let know if you still have questions. I am closing this thread for now.

Regards,
Sebastian

from arriba.

suhrig avatar suhrig commented on June 15, 2024

Assembly is not used to predict the transcribed strand (given after the slash). The transcribed strand is derived from either the strand of the supporting reads (if a stranded library was used) or from splice patterns of the supporting reads (if a non-stranded library was used).

Assembly is used to determine the transcript sequence around the breakpoints. Basically, Arriba performs a pileup of all the reads supporting a fusion and takes the most frequent base for each position.

Does this answer your question?

from arriba.

kgaonkar6 avatar kgaonkar6 commented on June 15, 2024

Thank you that does help.

So to clarify in the draw_fusions.R the script reads the "after the slash" strand information to gather/re-arrange the gene structure (and overlapping domains) to plot the fusion? That visualization looks amazing so I'm trying to understand it better.

from arriba.

suhrig avatar suhrig commented on June 15, 2024

draw_fusions.R could in theory use the strand information after the slash, but it does not need to, because Arriba arranges gene1 and gene2 to reflect the 5' and 3' end of the fusion. Transcription always starts in gene1 and ends in gene2 (with few exceptions where it's completely unclear). draw_fusions.R orients the fusion gene such that transcription goes from left to right. It does not need to evaluate the transcribed strand, because this information is implicit in the order of gene1 and gene2.

That visualization looks amazing so I'm trying to understand it better.

Thanks. Feel free to ask if you would like to customize some aspects. What may take an hour for you may be a matter of minutes for me. I can either send you a custom patch or - if the request is likely useful to many users - add a new feature.

from arriba.

kgaonkar6 avatar kgaonkar6 commented on June 15, 2024

Oh great! I'm actually trying to visualize STAR-fusion calls the same way arriba visualizes them to have similar plots from both callers which is why I had specific questions as to how the draw_fusion.R script works. https://gist.github.com/kgaonkar6/d2efc3660d31c20217a9071c6244e70a is what I have now to format the files correctly which seems to work to create the similar plots. But if you can send it as a patch to the draw_fusion.R script itself that would be really appreciated. I guess others might find it useful as well..?

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Hello @kgaonkar6

I'm trying to use your script to draw Fusions from StarFusion. It seem to work for some fusions but not all. Did you encounter this kind of behavior ?

Best,

from arriba.

kgaonkar6 avatar kgaonkar6 commented on June 15, 2024

Hi @Ephedria , I was able to plot all the fusions from my StarFusion file. However, I did notice if the gene was not found in the gtf file provided it would print out "Error XYZ exon coordinate not found in gtf" on the pdf page.
Also you might want to check the format/colnames of Starfusion, mine was output from STAR-Fusion 1.5.0

Did you get any errors?

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Hello Krutika,
I didn't get an error like you, mine stopped the process right away :

Warning message:
In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", :
number of items to replace is not a multiple of replacement length
Loading annotation
Loading protein domains
Drawing fusion #1: CALR:TANC2
Drawing fusion #2: SENP6:MYO6
Drawing fusion #3: IKZF2:ERBB4
Error in if ((codingLength1 == 0 || antisenseTranscription1) && (codingLength2 == :
missing value where TRUE/FALSE needed
Calls: drawProteinDomains
Execution halted

I tried to look at the line that cause the bug, I didn't found anything relevant...
My output is from a FusionInspector with the --examine_coding_effect which is the same as StarFusion.

I didn't thought of this possibility since your script worked fine. Could you send me an exemple of your StarFusion output so I may try to understand what cause the bug ?

Best,

from arriba.

kgaonkar6 avatar kgaonkar6 commented on June 15, 2024

Sorry for the late reply, here you go:
testStarFusion.txt

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Thanks for the link, I will try to found out the problem.

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Hello
Sorry I forgot to answer ... The header doesn't exactly look like Arriba's but it work on some fusions and not on others fusions.
I don't find out why ...
I will love to see draw_fusions.R working with StarFusion. I hope this is not too much work for you ...

Best,

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Ho cool I will be eager to test it !

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Thanks for the hard work.

I was testing it with an input of FusionInspector but I didn't seem to work ...
I had this error :

Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains

Can you send me the command used to create your output of StarFusion ? I will try to stick to it.

Best,
Ephe

from arriba.

suhrig avatar suhrig commented on June 15, 2024

Warning message:
In parseGtfAttribute("exon_number", exons) :

This error is related to the GTF file and has nothing to do with the fusion file. What GTF are you using? Can you send me a link or upload it somewhere?

Moreover, this is only a warning and in view of the fact that only 558 records are affected (of probably tens of thousands), this should not even be a concern. Is a PDF file generated as output?

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Hum quite strange, because I try with an ouput from Arriba and it worked well. I didn't change the gtf file.
I use the : GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf

It does create a PDF file, but it's empty ...

from arriba.

suhrig avatar suhrig commented on June 15, 2024

The error about parsing the GTF file should appear regardless of whether you use Arriba or STAR-Fusion. Can you confirm? Please send me the full call to draw_fusions.R for both Arriba and STAR-Fusion.

When I use CTAT annotation, the GTF parsing error is shown with output from both Arriba and STAR-Fusion. So at least the error is definitely not related to the fusions file. And in both cases a non-empty PDF file is produced. I'm suspecting an issue with the content of your STAR-Fusion file that the script cannot handle. Can you share the content? Or at least the header line and one fusion prediction? Thanks.

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

I did verified, you were right, I do have the same error when trying with an Arriba output...

Here is the output with Arriba output :
Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains
Drawing fusion #1: RNASEH2B-AS1:RNASEH2B
Drawing fusion #2: AC004943.2:AC116667.2
....

And here is the output with a FusionInspector output :

Loading annotation
Warning message:
In parseGtfAttribute("exon_number", exons) :
Failed to parse 'exon_number' attribute of 558 GTF record(s).
Loading protein domains
/mnt/go-docker/cmd.sh: line 8: 36 Killed /scratch2/tmp/mneou/Scripts/draw_fusions.R --fusions="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS.FusionInspector.fusions.abridged.tsv.coding_effect" --alignments="/scratch2/tmp/mneou/Arriba/MAP634_WTS/STAR/MAP634_WTSAligned.sortedByCoord.out.bam" --output="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS_fusions_CR.pdf" --annotation=/data/annotations/Human/GRCh38/index/star-fusion/v1.8/GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf --cytobands=/database/cytobands_hg38_GRCh38_2018-02-23.tsv --proteinDomains=/database/protein_domains_hg38_GRCh38_2019-07-05.gff3

What annotation are you using ? I will try to use the same one, if it can prevent this warning ?

Sure here it is :
https://github.com/Ephedria/FusionInspector/blob/master/Sample1_FusionInspector.tsv

Thanks again for the help.

Best

from arriba.

suhrig avatar suhrig commented on June 15, 2024

/mnt/go-docker/cmd.sh: line 8: 36 Killed

This is the real issue. The script got killed by an external process. The most likely explanation is that it ran out of memory. Typically, the script needs < 2GB. Does the script run inside a job with an artificial memory limit? Another explanation is that your job has a time limit and was killed after reaching that limit. Either way, this is not a bug in the script, but an operating system issue.

What annotation are you using ? I will try to use the same one, if it can prevent this warning ?

I use GENCODE >= v28. The warning is harmless, though. You can continue using your GTF file if you want. All this means is that about 1% of the genes will lack exon labels when they are drawn, i.e., there is no number inside the exon. So it's only cosmetic.

from arriba.

Ephedria avatar Ephedria commented on June 15, 2024

Hoo my bad, you were right I did ran out of memory ...
I did again by expending the memory allocatated it works fine.

I will try to test on different samples, but I pretty sure it will turn out well.
Thanks again for this tool and the help.

Best,

from arriba.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.