GithubHelp home page GithubHelp logo

Comments (7)

ivagljiva avatar ivagljiva commented on August 30, 2024

Hey @lexikazen , the bad news is that this looks like a bug with the coverage code. It would help me to fix it if you would be willing to send over some data so that I can reproduce the issue on my computer. But I'll take a look at the code regardless and see if I can figure out what is wrong.

The good news is that anvi-compute-metabolic-enrichment doesn't need the coverage data (and even if its there, it won't do anything with it anyway), so you can simply re-run anvi-estimate-metabolism without the --add-coverage flag, and you would get the output you need for the enrichment test.

from anvio.

meren avatar meren commented on August 30, 2024

I am not sure why this is happening from the output alone, but the only way there is a gene caller id, g (recognized in contigs-db), that is missing in self.profile_db.gene_level_coverage_stats_dict, is that the coverages were recovered for genes known to profile-db (i.e., because there was a cutoff that discarded some contigs from profiling or there was a bin that only focused on a subset of genes in the contigs-db) and somewhere in the upstream someone doesn't know about it. Constraining the gene calls to the known universe of genes by the profile-db are handled internally when the profile-db is initialized with a collection/bin, but maybe there is a bug somewhere when it is used through anvi-estimate-metabolism. Just mentioning these as self notes :)

We can probably reproduce this error by creating a collection for any metagenome bins in which do not include all contigs in the contigs-db.

PS: over 1 million genes -- that's a nice dataset, @lexikazen :)

from anvio.

lexikazen avatar lexikazen commented on August 30, 2024

@ivagljiva I tried to attach my profile db and contigs db, but Github won't let me upload them. How should I share those files with you?

from anvio.

ivagljiva avatar ivagljiva commented on August 30, 2024

Actually, it is okay @lexikazen :) I managed to find a dataset that reproduces the error. It works for me when specifying a collection without metagenome mode, but breaks when combining a collection with metagenome mode. I guess that is a test I forgot to do when developing :p I had assumed that people with collections would want to treat each bin in the collection as an individual genome rather than using the collection to split out a subset of contigs to estimate on individually, but clearly there is a need for the latter. So thank you very much for finding this edge case! I will use my test data to fix the bug :)

from anvio.

lexikazen avatar lexikazen commented on August 30, 2024

Actually, it is okay @lexikazen :) I managed to find a dataset that reproduces the error. It works for me when specifying a collection without metagenome mode, but breaks when combining a collection with metagenome mode. I guess that is a test I forgot to do when developing :p I had assumed that people with collections would want to treat each bin in the collection as an individual genome rather than using the collection to split out a subset of contigs to estimate on individually, but clearly there is a need for the latter. So thank you very much for finding this edge case! I will use my test data to fix the bug :)

Great thank you! :)

from anvio.

ivagljiva avatar ivagljiva commented on August 30, 2024

I managed to fix the bug :) Turns out we were loading gene calls from all splits in the DBs even when a collection name was passed, and this conflicted downstream when the gene coverages were loaded just for the collection.

The PR #2242 addresses the issue in anvio-dev. @lexikazen , if you want, you could install the development branch and try your command again in that environment, and it should work :)

from anvio.

meren avatar meren commented on August 30, 2024

Thank you very much, Iva! :)

from anvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.