GithubHelp home page GithubHelp logo

combat-tb / combattbmodel Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 1.0 39.35 MB

COMBAT-TB model is a Chado inspired graph model for genome annotation.

License: GNU General Public License v3.0

Python 100.00%
graph-model combat-tb-model chado neo4j genome-annotation py2neo

combattbmodel's People

Contributors

pvanheus avatar thobalose avatar zipho avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ibrahimah2

combattbmodel's Issues

comments on graph-based genome annotation model

Some brief comments on:

https://github.com/SANBI-SA/combat_tb_model/blob/master/docs/genome_annotation_model.md

Very clearly documented, thank you @thobalose and @pvanheus. The overall strategy makes lots of sense. Chado was designed as a graph database, but layered on relational technology. As a result there are maybe a few design decisions that could be revisited.

Dbxrefs

There is less need for a primary dbxref node. For SciGraph/Monarch, we use a property ID, and require that this is a CURIE.

For secondary dbxrefs, sometimes we just treat these as properties decorated on the node, in other cases as nodes in their own right. In the latter case, we don't really think of the type as being dvxref - if it's a uniprot dbxref then it's a protein object. The chado modeling of dbxrefs somewhat reflects the original MOD use case and the split between 'in-house' entities and 'the others'. When making a database for more integrative use cases this split is less useful.

The use of dbxrefs in chado can also lead to a kind of 'fake' referential integrity checking. Some rough thoughts in this doc:

https://docs.google.com/document/d/1fmXtC1oAk_5T5IB6tgilYnVgcV1wCpfi8vj9J8Ht6fU/edit

As we merge from multiple sources, we're interested in interpreting xrefs as stricter relationships that allows us to merge equivalence cliques. @jnguyenx will fill out this soon: https://github.com/SciGraph/SciGraph/wiki/Post-processors#clique-merge

Feature Locations

One limitation of Chado (and GFF3 and subsumed models) is that the start and end of a feature must be on the same reference. This was something of a compromise between query tractability and normalization. In Monarch we use the FALDO model. It's designed as an RDF schema so it works perfectly well in Neo4j

Bolleman, J. T., Mungall, C. J., Strozzi, F., Baran, J., Dumontier, M., Bonnal, R. J. P., โ€ฆ Cock, P. J. A. (2016). FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation http://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-016-0067-z

And for variant modeling, many groups like GA4GH are taking the approach of graph models with nucleotides as nodes. If everything you have can be mapped to a linear reference this buys you less, not sure about your use case here. Of course, both models can live in the same instance so long as there is a well-defined mapping.

Ontologies

Not much to add here, you have this down correctly. For mapping to more expressive formalisms like OWL there are subtleties, but I suggest you take advantage of existing mappings. For example: https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping

The proposed obographs JSON exchange for ontologies and ontology fragments may be of use. You might want to target this format for loading.

APIs

Thanks for your useful notes, will check py2neo out (useful for us @kshefchek?). It seems GMOD is very heterogeneous in APIs, but in general anything targeted to Chado should in theory be mechanically mappable to this Neo4J model. It may be useful to gather like minded GMOD folks together to explore approaches.

@nathandunn is keen to do this for apollo but this bandwidth is low...

Constraints

Many of Chado's ref integrrity checks are fakish. You can't have a dangling surrogate key, but you can always have stub objects at the end. The original idea was to use axioms in SO to constrain, but that was under some naive assumptions regarding the suitability of an expressive open-world formalism (OWL) to do closed-world constraint checking.

However, this topic is huge in some segments of the semweb community at the moment. There are promising developments like Shex/SHACL. Crucially, while these are developed within an RDF framework, they can be made to work for Neo4J. In Monarch we do a lot of pre-processing and data munging in turtle, and then just load the turtle into Neo4J at the end. We're planning on targeting this upstream layer for constraint checking etc.

If we could provide some use cases I can feed them to some of these groups.

Also worth mentioning is WormBase's datomic schema (datomic has schemas)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.