GithubHelp home page GithubHelp logo

Comments (5)

hoelzer avatar hoelzer commented on June 2, 2024 1

Hey @Ales-ibt - wow, thanks for the detailed report and visuals!

from emg-viral-pipeline.

Ales-ibt avatar Ales-ibt commented on June 2, 2024

Hello again 😄

After updating the Virify version we are locally running, now I'm getting the output of checkV and the gff table. However, the program is reporting one single prophage when I expect multiple. Also, the coordinates of the prediction on the gff file don't correspond to the location of any prophage in the prophages.fna file. An example here:

$ cat prophages_original.fasta | grep '>'

b5_04_1 prophage-181138:243992
b5_04_1 prophage-1725901:1775585
b5_04_1 prophage-2873420:2950084
b5_04_1 prophage-909689:958923
b5_04_1 prophage-690730:746532
b5_04_1 prophage-1508929:1577763
b5_04_1 prophage-3443947:3467484
b5_04_1 prophage-2728160:2758576

$ cat 08-final/gff/b5_04_mod_virify_contig_viewer_metadata.tsv
sequence_id contig virify_taxonomy start_of_first_viphog end_of_last_viphog checkv_provirus checkv_quality miuvig_quality
b5_04_1-start-6662-end-63413 b5_04_1 Caudovirales;;; 6662 63413 Yes High-quality High-quality

My guess is that CheckV is running on the whole contig and it is defining new coordinates for one single prophage prediction, instead of running on every predicted prophage.

I hope this info is useful to improve Virify output.

Bests,

Alejandra.

from emg-viral-pipeline.

mberacochea avatar mberacochea commented on June 2, 2024

Perfect, thank you Ale. I'll take a look

from emg-viral-pipeline.

Ales-ibt avatar Ales-ibt commented on June 2, 2024

Me again...

After taking a look at the output files and tracing back the checkV results, I have the following summary of problems associated with the integration of the outputs:

Problems related to the integration of predicted phages and quality:

  1. CheckV is running on the full contig for prophages
  2. Multiple phages occurring in the same contig are missing
  3. Prophages coordinates in the gff don't correspond to coordinates prediction
  4. Prophages with no CDS matching ViPhOG are excluded in the gff (is this for quality control?)
  5. Viral genomes in full contigs are missing in the gff output

Problems related to the GFF file format (which is not validated):

  1. Protein IDs in gff file are missing
  2. Add the mobile_genetic_element feature and the corresponding coordinates. Then link the corresponding CDS using the 'Parent' key in the attributes field
  3. Fix the CDS coordinates in the context of the full contig instead of the subsequence

Find attached some figures I prepared for this report.

Bests,

Alejandra.
virify_bug_Feb2023.pdf

from emg-viral-pipeline.

mberacochea avatar mberacochea commented on June 2, 2024

Fixed by #95 and #94

from emg-viral-pipeline.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.