Comments (15)
Hi @iwilkie, thanks for the question!
anvi-run-cazymes only runs the HMMs from dbCAN2 across the amino acids sequences in a contigs-db with hmmscan or hmmsearch. However, you can use anvi-import-functions to import any annotations from any homology detection strategy you want to a contigs-db. For example, you can export the amino acid sequences from your contigs-db and then search them against CAZy-DB with DIAMOND, and finally import those annotations back into the contigs-db.
Please let me know if you have anymore questions or if I can clarify more!
from anvio.
dbCAN treats them as 2 separate HMM libraries so I'd say to do the same and if a prot has matches with both, it's used as stronger evidence for that prot to be an actual CAZyme.
from anvio.
Thanks for pointing this out @xvazquezc! That definitely answers my question :)
from anvio.
Hi, I just spotted this function on the development version of anvi'o and am very excited to use it! I've been doing CAZyme annotations outside of anvi'o for a while now, but I think it would be great to incorporate them :)
I was wondering if there are any plans to incorporate e.g. CAZy-DB as a search database, or in general to follow dbCAN's recommendation for integrating different search tools (e.g. hmm and DIAMOND searches) to ensure for more accurate annotations?
Thanks,
Isa
from anvio.
@mschecht Thanks for the quick and detailed reply! Yes, exporting the AA sequence annotating it and then bringing that back into anvi'o works well, that's what I have been doing. I just stumbled upon this issue and was curious as to how much this function would be incorporated into anvi'o.
Thanks again! :)
from anvio.
@iwilkie I started working on this feature request here and got confused about where I can find dbCAN3 files, in particular, the dbCAN-sub HMMs.
Is this the dbCAN3 dbCAN_sub file? It's under the dir /dbCAN2/
, but that's what's hyperlinked from the dbCAN3 downloads page.
from anvio.
@mschecht as far as I know the HMM dbs for dbCAN and dbCAN-sub are different. Not all prots with a CAZy domain (i.e. matching dbCAN HMM profile), match dbCAN-sub profiles. Some dbCAN-sub profiles are not necessarily made from a subset of a dbCAN family domain, e.g. the subfamily PL6_e6
profile include sequences matching PL6 and CBM16
Coming back to the files, the one you indicate is the dbCAN-sub HMM, this one is the current dbCAN HMM (barely one month old).
In addition, this file can be used with the dbCAN-sub output to map EC/substrates to some of the subfamilies (which would be interesting to add too 😉 )
from anvio.
Thanks for the input @xvazquezc!
Coming back to the files, the one you indicate is the dbCAN-sub HMM, this one is the current dbCAN HMM (barely one month old).
To clarify, is this file a dbCAN-sub HMM file? https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt
I thought this was the standard dbCAN and not the dbCAN-sub. anvi-setup-cazymes can download this no problem!
Could this be the dbCAN_sub file? https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm
In addition, this file can be used with the dbCAN-sub output to map EC/substrates to some of the subfamilies (which would be interesting to add too 😉 )
That would be super cool! I think all this would take is a simple join
with the dbCAN annotations in the contigs-db
. What would be the best output file for you to leverage this data?
from anvio.
I know it's confusing, they use an URL address with dbCAN2
in it, but that's where dbCAN3
server is located... the old dbCAN2
is at dbCAN2-obsolete
. The new dbCAN3
is basically the same base dbCAN2
plus the dbCAN-sub
and substrate prediction - both through dbCAN-sub and dbCAN-PUL*.
About the files, this is the current dbCAN HMM: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt
, and this is the dbCAN-sub HMM: https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm
As for the substrate prediction, I think it needs to be matched based on the CAZy family and the predicted EC by dbCAN-sub (but tbh I'm not sure about the exact way the dbCAN server does it). Best would be to check the run_dbCAN
repo: https://github.com/linnabrown/run_dbcan
dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism
and related stuff so I'm not so confident about suggesting the best place this may go
*dbCAN-PUL relies in the CGCfinder (code here) and annotates experimentally validated Polysaccharide Utilization Loci (PUL) by searching transcription factors and transporters in the surrounding genes around CAZy-annotated ones. It seems the preferred method for the substrate matching but it also has a way more complex operation
from anvio.
@xvazquezc thank you very much for breaking this down for the anvi'o community!
About the files, this is the current dbCAN HMM: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt, and this is the dbCAN-sub HMM: https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm
With this in mind, I will add the dbCAN-sub HMM file to anvi-run-cazymes
so that users can access CAZyme HMMs.
- Add dbCAN_sub.hmm to anvi-setup-cazymes and anvi-run-cazymes in the branch upgrade-to-dbCAN3.
dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism and related stuff so I'm not so confident about suggesting the best place this may go
That is a great point! @ivagljiva what are your thoughts?
from anvio.
dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism and related stuff so I'm not so confident about suggesting the best place this may go
That is a great point! @ivagljiva what are your thoughts?
Hey y'all :)
If I am understanding correctly, dbCAN-sub is another set of HMMs that provides more specific gene annotations?
If so, then it doesn't directly have a place in metabolism prediction and should still be used via anvi-run-cazymes
to include annotations for these HMMs within the gene functions table.
However, users can then define their own metabolic pathways using the dbCAN-sub as a possible annotation source for the enzymes in the pathway :)
from anvio.
If I am understanding correctly, dbCAN-sub is another set of HMMs that provides more specific gene annotations?
If so, then it doesn't directly have a place in metabolism prediction and should still be used via anvi-run-cazymes to include annotations for these HMMs within the gene functions table.
Sounds good, thanks for the input!
However, users can then define their own metabolic pathways using the dbCAN-sub as a possible annotation source for the enzymes in the pathway :)
I like that a lot! This could fill the niche where users are studying PULs that are not currently available via the CAZyme frame work.
from anvio.
@meren to finish this feature request, I need to incorporate two HMM files into anvi-run-cazymes
:
I am currently working on this branch upgrade-to-dbCAN3.
dbCAN-HMMdb-V12.txt is already integrated but I am not sure how to smoothly add in the extra set of HMMs from dbCAN_sub.hmm. My two thoughts are (1) I can concatenate dbCAN_sub.hmm to dbCAN-HMMdb-V12.txt or (2) run HMMER separately on both HMM datasets. Which direction makes the most sense?
from anvio.
We can address #2148 in this branch :)
from anvio.
@mschecht sorry about not getting back to you on this! Somehow the notifications/emails didn't come through and I only just noticed.
I see that @xvazquezc was able to answer however, thank you :-)
from anvio.
Related Issues (20)
- Error running `anvi-reaction-network` HOT 22
- anvio 7.1: anvi-setup-ncbi-cogs --reset HOT 1
- [BUG] anvi-help not working anymore HOT 1
- [BUG] Most annotations from anvi-run-cazymes have undefined ('-') accession numbers HOT 5
- [BUG] anvi-script-augustus-output-to-external-gene-calls 0 gens parsed
- [DISCUSSION] Methylation processing in anvio HOT 2
- [BUG] Search with operators in the interactive interface HOT 1
- [BUG] anvio-cluster-contigs fails with CONCOCT HOT 13
- [BUG] anvi-get metabolic-model-file ImportError: Numba needs NumPy 1.25 or less HOT 2
- [FEATURE REQUEST] Outputs for function- and pathway-level variability analysis HOT 2
- [BUG] Insert a short but descriptive title (leave the '[BUG]' part) HOT 1
- DAStool finishes without errors but output not recgnized by Anvi'o HOT 3
- [BUG] Issue with --pre-computed-inversions HOT 2
- Performing `anvi-self-test` but the interactive operator did not load in Chrome HOT 1
- [FEATURE REQUEST] A command to make new HMM sources from a list of COG IDs. HOT 12
- Problems about Structure Display with anvi-display-structure HOT 7
- [BUG] anvi-interactive crashes when using a collection and a external tree HOT 8
- [BUG] anvi-get-sequences-for-hmm-hits in combination with --gene-names silently removes genomes HOT 7
- anvi-setup-scg-taxonomy and anvi-setup-ncbi-cogs error HOT 2
- [BUG] Genes with multiple annotations overinflate stepwise pathway copy number values HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anvio.