GithubHelp home page GithubHelp logo

Comments (15)

mschecht avatar mschecht commented on August 30, 2024 1

Hi @iwilkie, thanks for the question!

anvi-run-cazymes only runs the HMMs from dbCAN2 across the amino acids sequences in a contigs-db with hmmscan or hmmsearch. However, you can use anvi-import-functions to import any annotations from any homology detection strategy you want to a contigs-db. For example, you can export the amino acid sequences from your contigs-db and then search them against CAZy-DB with DIAMOND, and finally import those annotations back into the contigs-db.

Please let me know if you have anymore questions or if I can clarify more!

from anvio.

xvazquezc avatar xvazquezc commented on August 30, 2024 1

dbCAN treats them as 2 separate HMM libraries so I'd say to do the same and if a prot has matches with both, it's used as stronger evidence for that prot to be an actual CAZyme.

from anvio.

mschecht avatar mschecht commented on August 30, 2024 1

Thanks for pointing this out @xvazquezc! That definitely answers my question :)

from anvio.

iwilkie avatar iwilkie commented on August 30, 2024

Hi, I just spotted this function on the development version of anvi'o and am very excited to use it! I've been doing CAZyme annotations outside of anvi'o for a while now, but I think it would be great to incorporate them :)

I was wondering if there are any plans to incorporate e.g. CAZy-DB as a search database, or in general to follow dbCAN's recommendation for integrating different search tools (e.g. hmm and DIAMOND searches) to ensure for more accurate annotations?

Thanks,
Isa

from anvio.

iwilkie avatar iwilkie commented on August 30, 2024

@mschecht Thanks for the quick and detailed reply! Yes, exporting the AA sequence annotating it and then bringing that back into anvi'o works well, that's what I have been doing. I just stumbled upon this issue and was curious as to how much this function would be incorporated into anvi'o.

Thanks again! :)

from anvio.

mschecht avatar mschecht commented on August 30, 2024

@iwilkie I started working on this feature request here and got confused about where I can find dbCAN3 files, in particular, the dbCAN-sub HMMs.

Is this the dbCAN3 dbCAN_sub file? It's under the dir /dbCAN2/, but that's what's hyperlinked from the dbCAN3 downloads page.

from anvio.

xvazquezc avatar xvazquezc commented on August 30, 2024

@mschecht as far as I know the HMM dbs for dbCAN and dbCAN-sub are different. Not all prots with a CAZy domain (i.e. matching dbCAN HMM profile), match dbCAN-sub profiles. Some dbCAN-sub profiles are not necessarily made from a subset of a dbCAN family domain, e.g. the subfamily PL6_e6 profile include sequences matching PL6 and CBM16

Coming back to the files, the one you indicate is the dbCAN-sub HMM, this one is the current dbCAN HMM (barely one month old).

In addition, this file can be used with the dbCAN-sub output to map EC/substrates to some of the subfamilies (which would be interesting to add too 😉 )

from anvio.

mschecht avatar mschecht commented on August 30, 2024

Thanks for the input @xvazquezc!

Coming back to the files, the one you indicate is the dbCAN-sub HMM, this one is the current dbCAN HMM (barely one month old).

To clarify, is this file a dbCAN-sub HMM file? https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt
I thought this was the standard dbCAN and not the dbCAN-sub. anvi-setup-cazymes can download this no problem!

Could this be the dbCAN_sub file? https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm

In addition, this file can be used with the dbCAN-sub output to map EC/substrates to some of the subfamilies (which would be interesting to add too 😉 )

That would be super cool! I think all this would take is a simple join with the dbCAN annotations in the contigs-db. What would be the best output file for you to leverage this data?

from anvio.

xvazquezc avatar xvazquezc commented on August 30, 2024

I know it's confusing, they use an URL address with dbCAN2 in it, but that's where dbCAN3 server is located... the old dbCAN2 is at dbCAN2-obsolete. The new dbCAN3 is basically the same base dbCAN2 plus the dbCAN-sub and substrate prediction - both through dbCAN-sub and dbCAN-PUL*.

About the files, this is the current dbCAN HMM: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt, and this is the dbCAN-sub HMM: https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm

As for the substrate prediction, I think it needs to be matched based on the CAZy family and the predicted EC by dbCAN-sub (but tbh I'm not sure about the exact way the dbCAN server does it). Best would be to check the run_dbCAN repo: https://github.com/linnabrown/run_dbcan

dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism and related stuff so I'm not so confident about suggesting the best place this may go

*dbCAN-PUL relies in the CGCfinder (code here) and annotates experimentally validated Polysaccharide Utilization Loci (PUL) by searching transcription factors and transporters in the surrounding genes around CAZy-annotated ones. It seems the preferred method for the substrate matching but it also has a way more complex operation

from anvio.

mschecht avatar mschecht commented on August 30, 2024

@xvazquezc thank you very much for breaking this down for the anvi'o community!

About the files, this is the current dbCAN HMM: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt, and this is the dbCAN-sub HMM: https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN_sub.hmm

With this in mind, I will add the dbCAN-sub HMM file to anvi-run-cazymes so that users can access CAZyme HMMs.

dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism and related stuff so I'm not so confident about suggesting the best place this may go

That is a great point! @ivagljiva what are your thoughts?

from anvio.

ivagljiva avatar ivagljiva commented on August 30, 2024

dbCAN-sub and the substrate predictions might be better with metabolic prediction infrastructure... never got to deal with anvio-estimate-metabolism and related stuff so I'm not so confident about suggesting the best place this may go

That is a great point! @ivagljiva what are your thoughts?

Hey y'all :)
If I am understanding correctly, dbCAN-sub is another set of HMMs that provides more specific gene annotations?
If so, then it doesn't directly have a place in metabolism prediction and should still be used via anvi-run-cazymes to include annotations for these HMMs within the gene functions table.

However, users can then define their own metabolic pathways using the dbCAN-sub as a possible annotation source for the enzymes in the pathway :)

from anvio.

mschecht avatar mschecht commented on August 30, 2024

If I am understanding correctly, dbCAN-sub is another set of HMMs that provides more specific gene annotations?
If so, then it doesn't directly have a place in metabolism prediction and should still be used via anvi-run-cazymes to include annotations for these HMMs within the gene functions table.

Sounds good, thanks for the input!

However, users can then define their own metabolic pathways using the dbCAN-sub as a possible annotation source for the enzymes in the pathway :)

I like that a lot! This could fill the niche where users are studying PULs that are not currently available via the CAZyme frame work.

from anvio.

mschecht avatar mschecht commented on August 30, 2024

@meren to finish this feature request, I need to incorporate two HMM files into anvi-run-cazymes:

I am currently working on this branch upgrade-to-dbCAN3.

dbCAN-HMMdb-V12.txt is already integrated but I am not sure how to smoothly add in the extra set of HMMs from dbCAN_sub.hmm. My two thoughts are (1) I can concatenate dbCAN_sub.hmm to dbCAN-HMMdb-V12.txt or (2) run HMMER separately on both HMM datasets. Which direction makes the most sense?

from anvio.

mschecht avatar mschecht commented on August 30, 2024

We can address #2148 in this branch :)

from anvio.

iwilkie avatar iwilkie commented on August 30, 2024

@mschecht sorry about not getting back to you on this! Somehow the notifications/emails didn't come through and I only just noticed.

I see that @xvazquezc was able to answer however, thank you :-)

from anvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.