Comments (5)
Hey @Ales-ibt - wow, thanks for the detailed report and visuals!
from emg-viral-pipeline.
Hello again 😄
After updating the Virify version we are locally running, now I'm getting the output of checkV and the gff table. However, the program is reporting one single prophage when I expect multiple. Also, the coordinates of the prediction on the gff file don't correspond to the location of any prophage in the prophages.fna file. An example here:
$ cat prophages_original.fasta | grep '>'
b5_04_1 prophage-181138:243992
b5_04_1 prophage-1725901:1775585
b5_04_1 prophage-2873420:2950084
b5_04_1 prophage-909689:958923
b5_04_1 prophage-690730:746532
b5_04_1 prophage-1508929:1577763
b5_04_1 prophage-3443947:3467484
b5_04_1 prophage-2728160:2758576
$ cat 08-final/gff/b5_04_mod_virify_contig_viewer_metadata.tsv
sequence_id contig virify_taxonomy start_of_first_viphog end_of_last_viphog checkv_provirus checkv_quality miuvig_quality
b5_04_1-start-6662-end-63413 b5_04_1 Caudovirales;;; 6662 63413 Yes High-quality High-quality
My guess is that CheckV is running on the whole contig and it is defining new coordinates for one single prophage prediction, instead of running on every predicted prophage.
I hope this info is useful to improve Virify output.
Bests,
Alejandra.
from emg-viral-pipeline.
Perfect, thank you Ale. I'll take a look
from emg-viral-pipeline.
Me again...
After taking a look at the output files and tracing back the checkV results, I have the following summary of problems associated with the integration of the outputs:
Problems related to the integration of predicted phages and quality:
- CheckV is running on the full contig for prophages
- Multiple phages occurring in the same contig are missing
- Prophages coordinates in the gff don't correspond to coordinates prediction
- Prophages with no CDS matching ViPhOG are excluded in the gff (is this for quality control?)
- Viral genomes in full contigs are missing in the gff output
Problems related to the GFF file format (which is not validated):
- Protein IDs in gff file are missing
- Add the mobile_genetic_element feature and the corresponding coordinates. Then link the corresponding CDS using the 'Parent' key in the attributes field
- Fix the CDS coordinates in the context of the full contig instead of the subsequence
Find attached some figures I prepared for this report.
Bests,
Alejandra.
virify_bug_Feb2023.pdf
from emg-viral-pipeline.
from emg-viral-pipeline.
Related Issues (20)
- Unable to locate package docker-ce-cli HOT 1
- `annotate:ratio_evalue (1)` terminated with an error exit status (1) -- 'nextflow-autodownload-databases' Is a directory error HOT 1
- Can Prodigal-GV make 'real world' difference compared to original Prodigal? HOT 2
- Update ViPhOGs to remove old models that are associated with discontinued viral taxa HOT 5
- CWL discontinued
- No space left in /tmp HOT 2
- preprocess:rename idle state HOT 12
- Fix sankey plot visualization for undefined ranks
- Imitervirales assignment although not well supported by ViPhOGs
- Repository may be corrupted HOT 3
- Taxonomic ranks are inverted HOT 9
- Issue for running on ARM64-based Linux VM HOT 5
- wrapper_phage_contigs_sorter_iPlant HOT 1
- [Feature Request] Conda Distribution for Taxonomic Identification HOT 1
- Input file name collision on 'annotate:write_gff' HOT 2
- COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes
- Missing process or function with name 'getFileSystem' HOT 3
- Not able to run "nextflow pull EBI-Metagenomics/emg-viral-pipeline" HOT 2
- Error executing process > 'annotate:hmmscan_viphogs (1)'
- CheckV failed HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emg-viral-pipeline.