GithubHelp home page GithubHelp logo

Comments (4)

defleury avatar defleury commented on August 23, 2024

Dear Marco,

since these are only a handful of genomes I'd suggest looking at the (interactive) output of gunc plot and then deciding based on that. This will give you an idea where the signal for the CSS comes from – which contigs are labelled as originating from a different source, and at which taxonomic level the conflict is introduced. In particular, this allows you to further follow up the "offending" contigs and see if there is a systematic bias.

I see that the genomes in question are all in the Legionella group. It might be that the contaminant pattern between them is systematic, e.g. if there is a mislabelled Legionella genome elsewhere in the database or another genome incorrectly labelled as Legionella. This would become visible based on the plots. I know very little about Legionella biology, but cryptic extrachromosomal elements that were not detected by standard plasmid removal tools could be an alternative explanation.

In general, I would always value expert curation (in this case, your assessment of the genomes since you have worked with them in the past I suppose) over any tool's output.

from gunc.

mgabriell1 avatar mgabriell1 commented on August 23, 2024

Dear Sebastian,
Thanks for the suggestions. I initially used Anvio to spot potential sources of contamination, but I will give a try also with gunc plot on the manually curated genomes.

from gunc.

mgabriell1 avatar mgabriell1 commented on August 23, 2024

Just to follow up briefly on this:
I noticed that all the RefSeq genomes that I have checked which showed a high CSS value achieved this a the genus level and the issue was due to the fact that besides Legionella, also the genus Fluoribacter was detected. However, Fluoribacter is a synonim for Legionella (https://lpsn.dsmz.de/genus/fluoribacter) so in my case the high CSS values were a false positive.
On one hand this highlights the importance of using gunc plot to better understand the data, but on the other possibly suggests to take the issue of synonyms in future db versions.
Thanks again for the great tool

from gunc.

defleury avatar defleury commented on August 23, 2024

Dear Marco,

thanks for the update! Glad you could resolve the issue this way.

And thanks in particular for pointing out the Fluoribacter issue; we mostly inherited taxonomy from NCBI via proGenomes2, although we already did some extensive curation for the first db release. We are currently finalising the release of a new, larger db that will use GTDB taxonomy under the hood – I just checked and GTDB indeed lists Fluoribacter genomes under Legionella, so this particular problem should hopefully not occur any more in the future.

from gunc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.