GithubHelp home page GithubHelp logo

Comments (21)

zachary-foster avatar zachary-foster commented on June 15, 2024

Yes, I think having both types would be useful. A type of object that holds individual taxa or perhaps a list of taxa, for which each taxon is self-contained. This would provide a good contrast with the taxmap objects, where it is assumed that there is data classified by taxa and taxa are not self-contained. Interconversion methods would be good to have. I have not thought about this much yet though.

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

We should think about a few use-cases for the other types of classes. taxmap is designed to be as abstract as possible, not even specific to biology really, so it would be nice to have some classes that consider things like rank, like the taxa class currently does.

How do you envision these kindof classes will be used? In taxize?

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

I was thinking about this just now and I am having problem figuring out what information should be contained within a "Taxon".

  1. A single rank or a full classification? Is "Ascomycota" a taxon alone, or only in the context of "Eukaryota|Fungi|Ascomycota"? Should these two possibilities represent two different classes?
  2. Multiple valid names? Is "Animalia" and "Metazoa" the same taxon? Should a taxon object allow for multiple names to take this into account?
  3. Multiple valid IDs? Same taxon in different databases? Arbitrary ID and a database ID?
  4. User-defined data? Should this be included in the taxon object itself?

from taxa.

sckott avatar sckott commented on June 15, 2024

thanks for all your thoughts, will respond soon, was driving all day ๐Ÿš—

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

No problem. sounds good

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

Here is another quandary:

  1. What about "placeholder" "taxa", like "undefined", "incertae sedis", "spp", and "Root"? Are these taxa? When comparing taxa, a "undefined" basidomycete is different than a "undefined" ascomycete, but if the whole classification hierarchy is not known or not contained within the taxon class then this is less clear.

from taxa.

sckott avatar sckott commented on June 15, 2024

How do you envision these kind of classes will be used? In taxize?

I envision similar to sp, where other pkgs use taxa for base taxonomy classes, either building on, reading from, writing to. In taxize, we could have all the get_*() fxns coerce to taxa classes as their output (which means we need a way to have multiple taxa together of course)

from taxa.

sckott avatar sckott commented on June 15, 2024
  1. A single rank or a full classification? Is "Ascomycota" a taxon alone, or only in the context of "Eukaryota|Fungi|Ascomycota"? Should these two possibilities represent two different classes?

I think both should be allowed. And diff. classes, yes. That is, single names alone can be class taxon e.g., and a hierarchy of names class hierarchy

  1. Multiple valid names? Is "Animalia" and "Metazoa" the same taxon? Should a taxon object allow for multiple names to take this into account?

We can't reasonably account for all these synonyms ourselves, so I guess we should assume the user or database has to supply the info. But if there is info on synonyms, then yeah, seems worth accounting for those.

  1. Multiple valid IDs? Same taxon in different databases? Arbitrary ID and a database ID?

Right, and each database can have a somewhat different hierarcy, each of which taxa has a different ID. Perhaps a class for a single database reference and its taxonomic names, then another class that combines data from two or more databases

  1. User-defined data? Should this be included in the taxon object itself?

What do you mean by user defined data? like the kind of data in the taxmap examples? I think I was thinking of just taxonomy data in the taxon classes

from taxa.

sckott avatar sckott commented on June 15, 2024
  1. What about "placeholder" "taxa", like "undefined", "incertae sedis", "spp", and "Root"? Are these taxa?

I think they have to be considered/included somehow. E.g., thinking about ecologists, there's often unknown species, where the lowest known name is a family e.g.,

When comparing taxa, a "undefined" basidomycete is different than a "undefined" ascomycete, but if the whole classification hierarchy is not known or not contained within the taxon class then this is less clear.

Right, they are different, but if the user doesn't supply the information, then it'd be hard for us to automatically pull that out. e.g., if they give undefined basidomycete, we could try to parse out a taxonomic name from it, but we'd need something more sophisticated than we currently have. They could give undefined basidomycete as the name, then supply Basidiomycota as the phylum in another class

from taxa.

sckott avatar sckott commented on June 15, 2024

use cases for the taxa classes:

binomen

Right now, binomen defines taxonomic classes AND has functions for manipulating those classes (combining, separating, sorting, etc.). With classes being defined in this pkg, we can remove taxonomic classes from binomen, and it will only have the functions to manipulate taxonomic classes

taxize

All get_*() functions that get IDs for one or more taxa could instead of giving back the simple S3 class that's just the ID with some attributes, we could coerce to a taxa class and give that back - Then that class is a known thing that we can coerce to other things, like a taxmap class

from taxa.

sckott avatar sckott commented on June 15, 2024

@zachary-foster okay, pushed up some changes to the taxa classes - reinstall and see egs.

I have more notes i wrote down on paper for use cases and classes, will put those here

also, more to do:

  • handle > 1 of each of the classes. have plural versions of each class?
  • when there's IDs for every rank and name pair, not just the 1 target name, how to handle that

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

@sckott Nice, I will look at your changes now. In the mean time, here is my thoughts on your comments above.

We can't reasonably account for all these synonyms ourselves, so I guess we should assume the user or database has to supply the info.

Yes, I think it would be sufficient to have the name field accept multiple values and have any comparison functions take that into account. This could be useful for people in metagenomics who might start out with arbitrary names and identity them during an analysis (e.g. a taxon can be both "OTU1" and "bacillus"). It also could be used when combining different taxonomies, although automating that might not be possible.

Right, and each database can have a somewhat different hierarcy, each of which taxa has a different ID. Perhaps a class for a single database reference and its taxonomic names, then another class that combines data from two or more databases

Hmm, I think that adding another class specifically for multiple databases might complicate things more than it is worth since we might end up making a whole new set of manipulation functions for it. How about having the database_id be its own simple class and be able to add multiple database_id to taxon? If we add the database_id at the taxon level instead of the hierarchy level we can avoid the differing hierarchy problem. A hierarchy's ID can be just the database_id list of the tip taxon. If we do this, it might be good to have an analogous database_name class.

I think I was thinking of just taxonomy data in the taxon classes

That sounds fine.

Right, they are different, but if the user doesn't supply the information, then it'd be hard for us to automatically pull that out....

How about having a database_id with a value of NA for unknown taxa?

have plural versions of each class?

Hmm, not sure. A hierarchy is pretty much an ordered plural of taxon, but that is different than a list of taxon. Can you think of any information that would apply to a list of taxon or hierarchy but not a single object? A simple list of objects might be sufficient. But then again, we would probably want a custom print method for a list of taxon or hierarchy, which would require a plural class. We could have taxa and taxonomy for the plurals of taxon and hierarchy.

when there's IDs for every rank and name pair, not just the 1 target name, how to handle that

Have the IDs associated with the individual taxa and not the hierarchy as a whole?

from taxa.

sckott avatar sckott commented on June 15, 2024

Hmm, I think that adding another class specifically for multiple databases might complicate things more than it is worth since we might end up making a whole new set of manipulation functions for it. How about having the database_id be its own simple class and be able to add multiple database_id to taxon? If we add the database_id at the taxon level instead of the hierarchy level we can avoid the differing hierarchy problem. A hierarchy's ID can be just the database_id list of the tip taxon. If we do this, it might be good to have an analogous database_name class.

Sounds good to have a simple class for database_id and can add multiple to a taxon. I don't think a hierarchy itself needs an ID - all the taxa within it will have IDs - hierarchy does need metadata about which database it came from (the database_name class/string)

How about having a database_id with a value of NA for unknown taxa?

sounds good

Hmm, not sure. A hierarchy is pretty much an ordered plural of taxon, but that is different than a list of taxon. Can you think of any information that would apply to a list of taxon or hierarchy but not a single object? A simple list of objects might be sufficient. But then again, we would probably want a custom print method for a list of taxon or hierarchy, which would require a plural class. We could have taxa and taxonomy for the plurals of taxon and hierarchy.

Right, multiple of any class could simply be a list. But as you said we could attach a S3 class to the list of many taxon's, or whatever the class is, so we can make it easy to know what to do downstream (whereas if it's just a list, we have to do checks to make sure it's what we expect it to be)

Have the IDs associated with the individual taxa and not the hierarchy as a whole?

sounds good

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

I looked through the code you put up recently and it seems to be a good fit for classical species-based taxonomic data, the type you would find in ecological studies/surveys. The only concerns I have is that it assumes that the user has, or is mostly interested in, species-level information (the name class). In my work (metagenomics), we often donโ€™t have species information, although you have more experience than I do in what people generally want.

Also, taxonomic names are present in both the grouping and name classes, which confused me at first. If there was a function that returned a supertaxon/subtaxon from a taxon object I would expect the output to be another taxon object; it seems like the output would be a character from the grouping of that taxon object in this implementation?

from taxa.

sckott avatar sckott commented on June 15, 2024

In my work (metagenomics), we often donโ€™t have species information

i assume you mean you could just have an ID, and no name at all, right?

from taxa.

sckott avatar sckott commented on June 15, 2024

If there was a function that returned a supertaxon/subtaxon from a taxon object I would expect the output to be another taxon object

right, makes sense

it seems like the output would be a character from the grouping of that taxon object in this implementation?

right

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

i assume you mean you could just have an ID, and no name at all,

Partially, yes, that often happens. But what I meant was that sometimes a sequence can only be assigned to a coarse taxonomic rank (e.g., family or phylum) and the species or genus can not be determined.

from taxa.

sckott avatar sckott commented on June 15, 2024

Ah right. I see what you mean. Do the changes in https://github.com/ropenscilabs/taxa/tree/taxa-class-rework account for this now?

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

Yes, just being able to define hierarchies without species information does the trick.

I went through taxa-class-rework and it looks good. I like the print methods.

from taxa.

zachary-foster avatar zachary-foster commented on June 15, 2024

We can probably close this too?

from taxa.

sckott avatar sckott commented on June 15, 2024

sounds good,

from taxa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.