Comments (2)
It can happen sometimes that a single gene sequence has multiple matches to the same HMM model, especially if that gene sequence has multiple similar protein domains. Here is a random example I took from a test database (as reported in the HMMER output table):
992 - K23537 - 8.3e-84 276.7 0.1 2.9e-44 146.1 0.0 2.1 1 1 1 2 2 2 2 -
992 - K23547 - 9.4e-63 207.0 0.3 7.7e-34 111.6 0.1 3.0 2 1 0 2 2 2 2 -
The first column is the gene caller ID (992), and it has two separate hits for the same KOfam (K23537). Neither of them are great hits (the bit score threshold for this KO is 578.67, compared to 276.7 and 207.0 coming from the 6th column in the table). Since this particular KO is an ATP-binding protein, I would venture a guess that the multiple hits are related to its ATP-binding domain(s).
The reason you don't see this in the kofamscan output is because kofamscan simply doesn't report multiple hits to the same KO for a given gene. And I would say that you will rarely (if ever) see multiple annotations for the same KO for the same gene call in anvi'o, either, since this situation only tends to happen for weaker matches that are often below the bit score threshold, and if they pass the annotation heuristic in this case, we only add one annotation to the unique KO so it never results in multiple hits being added to the database.
And to more directly answer your question, we look for unique hits as an additional confidence check since these are weak matches and we want to be careful to avoid introducing garbage. If the matching KOs are not all unique, then we know this is a case of 'non-specific' matching and we shouldn't annotate :)
I hope that makes sense.
from anvio.
Hi @ivagljiva
Thank you so much for the detailed response. I believe that has cleared up my confusion.
Best,
James
from anvio.
Related Issues (20)
- [FEATURE REQUEST] Annotate which KOfams were added by our bitscore relaxation heuristic
- [FEATURE REQUEST] A conda package for anvi'o with a minimal installation option HOT 1
- [BUG] `--prodigal-single-mode` breaks metagenomic workflow HOT 5
- [BUG] anvio-cluster-contigs fails with Generate input data
- [BUG] anvi-meta-pan-genome does not allow me to use gene calls not from prodigal HOT 10
- [BUG] Missing USearch in installation instruction and workflow DAG declarations HOT 1
- [BUG] contigs.db has issues when inputing aa_sequence in the external-gene-call file HOT 9
- [BUG] MaxBin2 failing due to one missing coverage HOT 2
- [BUG] anvio installation HOT 15
- [BUG] Pandas dataframe has no attribute 'append' HOT 4
- [BUG] Pandas error in `anvi-get-codon-frequencies` HOT 6
- [BUG] ERROR in running anvi-script-gen_stats_for_single_copy_genes.R HOT 1
- [FEATURE REQUEST] Using external gene clusters for anvi-pan-genome HOT 1
- [FEATURE REQUEST] Case-sensitive search in `anvi-export-locus` HOT 6
- [BUG] The CONTIGS.db file is 0kb and slurm job is still running HOT 2
- Interactive interface, TypeError: Cannot read properties of undefined (reading 'angle') when trying to organize by length HOT 5
- [FEATURE REQUEST] adding contigs database names to deflines of exported genes/proteins fasta HOT 2
- [BUG] cannot import vbgmm in concoct
- [BUG] anvi-pan-genome - diamond BUG
- Running anvio through snakemake, no errors but the job submission stops running HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anvio.