GithubHelp home page GithubHelp logo

cbiit / bento-meta Goto Github PK

View Code? Open in Web Editor NEW
5.0 10.0 2.0 18.99 MB

Bento Metamodel

Perl 11.21% Dockerfile 0.02% Python 26.10% Shell 0.02% Jupyter Notebook 0.10% Cypher 62.43% Batchfile 0.06% Makefile 0.05%
icdc ctdc bento graph

bento-meta's Introduction

Build Status

Bento Meta DB

Example Queries

The metamodel database (MDB) records

  • node/relationship/property structure of models;
  • the official local vocabulary - terms that are employed in the backend data system;
  • synonyms for local vocabulary mapped from external standards; and
  • the value sets for properties with enumerated value domains, and data types for other properties.

The production instance of MDB will contain the "official" representation of a data model, in that it will record the curated external terminology mappings and official sets of valid terms for each relevant property. In this way, the MDB is an extension of the MDF for any model it contains.

As the central location for official mappings to external vocabularies, the MDB can (should) be used as part of software modules that convert between the data physically stored in the production database and external standards. For example, an API known as the Simple Terminology Service STS, using MDB as its backend, is used for simple queries about a given model and validation of incoming data.

Programming APIs

APIs for working with the MDB are available in Python and Perl.

See the Python documentation.

Structure

The MDB is formulated as a graph model. This model contains more structure than will be exposed by the Simple Terminology Service (in order to keep it Simple). Other services can be built on the DB to perform translations, add terms and mappings, create visualizations, and other functions.

The metamodel is described in a Model Description File. Documents and tools for this format are at bento-mdf.

metamodel graphic

The MDB model contains the following nodes.

Node

A node with the label "node" in the MDB represents a model node: for example, a Diagnosis node.

Relationship

A Relationship node represents a model relationship; for example, a model may entail a relationship has_diagnosis from a Case node to a Diagnosis node. To represent this in the metamodel, a Relationship node is created with handle = has_diagnosis, along with a link has_src to the Case node, and a link has_dst to the Diagnosis node. A Neo4j relationship is also created between Node nodes, with a type = underscore+<relationship handle>. In the example, Case and Diagnosis nodes would be linked by a _has_diagnosis relationship.

Property

A Property node in the Model DB represents a property of a model node: for example, the disease property of a Diagnosis node.

Concept

A Concept node represents an intellectual concept. It is abstract, in that it has no human readable name as such; it will however have a unique ID. The Concept node can be thought of as a connecting point for sets of Terms that are identical in meaning (are synonymous).

Predicate

A Predicate node is a means for semantically relating two Concept nodes, A Predicate itself can also link to a Concept that it represents. Semantic concepts behind a predicate could be, e.g., "contains", "is a child of", "is broader than".

Origin

An Origin node represents an entity (institution, internal project, defined standard, recognized body, public database) that defines and/or promulgates a terminology and represents it authoritatively.

Term > Value

A Term node is an instance of encoding (a "representation") of a concept. Each Term node is linked to at least one Origin node, which represents the entity that provides the term value/code and/or the term semantics.

The value property of a Term node is the string representation of the term. This is a token that, for example, may be physically stored in a database as a datum.

Value Set

A Value Set node aggregates (links to) a number of Term nodes that define the list of acceptable values for a property slot. The value set does not directly aggregate Concepts; it is meant to define the pragmatic set of valid representations of values for a property.

Concept Group

A Concept Group node aggregates (i.e., links to) Concept nodes. Concept Groups might be implicit. A Value Set node, for example, implicitly defines a Concept Group - the set of those Concept nodes that are linked to the Term nodes aggregated by the Value Set.

Tag

A Tag node represents a simple key/value pair. Any other type of node can be annotated with a Tag node.

Semantic Information in the MDB

Semantic structure (for example, hierarchical groupings of concepts, or other "facts" or "predicate" relationships), besides synonymy and value set grouping, can be recorded in the MDB using the predicate node to link subject and object concept nodes. In general, to the extent that semantic information exists, it is better to access it by external services via the relevant Origins (e.g., NCI Thesaurus). External model topologies need not be concordant with the model structure represented in the MDB. However, it can be useful to semantically annotate certain concepts within the MDB itself. In particular, parent-child, class-subclass, and more general SKOS annotations may be useful to have locally for terminology mapping applications.

Reading and Writing to the MDB

An object model of the MDB in Python, bento-meta, is available and recommended.

Notes regarding loading the MDB with model description files, and creating external mappings, are here. See loaders for a number of initial loading scripts.

Example Queries

  • What are the nodes in the ICDC model?

      match (n:node {model:"ICDC"}) return n;
    
  • What are the nodes in the CTDC model?

      match (n:node {model:"CTDC"}) return n;
    
  • What are the acceptable values for the ICDC "body_system" property?

      match (p:property {handle:"body_system", model:"ICDC"})-->(:value_set)-->(t:term)
         return t.value;
    
  • Are there properties that have the same name ("handle") in both ICDC and CTDC ?

      match (p:property {model:"ICDC"}), (q:property {model:"CTDC"}) where p.handle=q.handle
        return p.handle;
    
  • How many nodes with the same handle appear in both models?

      match (n:node) with count(n) as ct, n.handle as handle where (ct > 1) return handle;
    
  • Do those nodes refer to the same semantic concept, or different concepts?

      match (n:node) with count(n) as ct, n.handle as handle where ct>1
        match (n:node)-[:has_concept]->(c) where n.handle=handle return n,c;
    
  • What terms are synoymous with the ICDC term adverse_event_grade, and where do those terms come from?

      match (:term {value:"adverse_event_grade"})-->(c:concept) with c match (c)<--(t:term)-->(o:origin) return t.value,o.name;
    
  • What's the NCIT concept code mapped to the ICDC term ae_dose?

      match (:term {value:"ae_dose"})-->(c:concept)<--(t:term)-->(o:origin {name:"NCIT"}) return t.origin_id;
    
  • What BRIDG entities are mapped to the ICDC relationship member_of? What are the relevant BRIDG mapping paths?

      match (:relationship {handle:"member_of"})-->(c:concept)<--(t:term)-->(o:origin {name:"BRIDG"}) return t.value, t.mapping_path
    

bento-meta's People

Contributors

bensonml avatar majensen avatar nelsonwmoore avatar traviscibot avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bento-meta's Issues

Docs build breaking with mergespec() update

Discovered that the docs build is breaking. The issue has to do with being able to access the subclass specific (i.e., not the Entity class) attributes separately. Fixing objects.py to account for this.

This will lead to a very slight change in how one declared the attspec (basically, use attspec_, not attspec in the subclass def). The fix will also update the docs to account for this.

`get_by_id` broken

object.nanoid not yet implemented (it is implemented in branch feat/nanoidvsid)

object_map.get_by_id() is broken

  • using Neo4j identifier id is not stable, lookups should use nanoid instead
  • function- doesn't work if called on an instance or class
bento_meta.object_map.get_by_id('wze6rN')
n = bento_meta.object.Node()
n.get_by_id('wze6rN')

missing and superfluous arguments in `object_map.py`

There appears to be an unused parameter albl that not used in the query in object_map.py lines 449 - 453

return "MATCH (n:{lbl}){rel}(a) WHERE id(n)={neoid} DELETE r RETURN id(n),id(a)".format(

          return "MATCH (n:{lbl}){rel}(a) WHERE id(n)={neoid} DELETE r RETURN id(n),id(a)".format(
            lbl=self.cls.mapspec()["label"],
            albl=end_lbls[0],
            rel=rel,
            neoid=obj.neoid)

and a missing parameter aneoid and unused parameter albl in the query in object_map.py lines 475 - 480

qry = "MATCH (n:{lbl}){rel}(a) WHERE id(n)={neoid} AND id(a)={aneoid} AND ({cond}) DELETE r RETURN id(n),id(a)".format(

            qry = "MATCH (n:{lbl}){rel}(a) WHERE id(n)={neoid} AND id(a)={**aneoid**} AND ({cond}) DELETE r RETURN id(n),id(a)".format(
              lbl=self.cls.mapspec()["label"],
              albl=end_lbls[0],
              neoid=obj.neoid,
              cond=cond,
              rel=rel)

Reloading data from MDB?

I cannot seam to be able to 'reload' data from MDB into Bento-meta.

Use case: using Bento-meta for modeling data in web server. End-user Anna goes to "sts.com" and sees the data that is stored in the MDB for the value set "disease". A few minutes later, end-user Bob goes to "sts.com" and makes an update to the disease value set., adding a term to a value set. When Anna revisits the the value set a few minutes later, she should see the now-updated value set.

When it comes to using "Bento-meta", I cannot find a way to reload data. Any attempts to "reload" result in error:
$m->load_all_db_models($bolt_url);. I need to be able to have Bento-meta continually polling MDB very quickly to get any changes.

error:

Can't call method "handle" on an undefined value at /Users/bensonml/0_SRC/bento-meta/perl/lib/Bento/Meta/Model/Edge.pm line 30.

The problem is compounded by the fact that it takes about 100 seconds to run $m->load_all_db_models($bolt_url);, which means that each time a person wants to go to a new page, or look at a different value set, or node and I have to reload the model for each query, it will take 1:40 to load and display each page.

Reloading error example snippet:

   $bolt_url //= 'bolt://54.156.191.24:7687';
    print "Using Bento::Meta version=$Bento::Meta::VERSION\n";

    my $m = Bento::Meta->new;

    $m->load_all_db_models($bolt_url);
    my $icdc = $m->model('ICDC');
    print "---A---\n";

    $m = undef;
    $icdc = undef;
    $m = undef;
    $m = Bento::Meta->new;

    print "---B---\n";
    $m->load_all_db_models($bolt_url);
    print "---C---\n";
    $icdc = $m->model('ICDC');
    print "---D---\n";

    my @icdc__nodes = $icdc->nodes();
    #print Dumper (@icdc__nodes);
    foreach my $n (@icdc__nodes) {
        print "---E---\n";
        #my $n_ = $m->modelnode($n);
        #print Dumper ($n_);
        #print "\t\txxxxxxxx\n\n";
        my $n_handle_ = $n->handle();
        print "\t - node: $n_handle_\n";
        my @n_props = $n->props();
        foreach my $n_prop (@n_props){
            print "---F---\n";
            my $n_p_h = $n_prop->handle();
            print "\t\t-node-property: $n_p_h\n";
        }
        #my $name = $n->handle();
        #print "   found $name\n";
    }

    print "---G---\n";

yields the following error:

❯ perl simple.pl
Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.32), passed through in regex; marked by <-- HERE in m/^([:$?])|({ <-- HERE [^}]+}$)/ at /Users/bensonml/perl5/perlbrew/perls/perl-5.30.2/lib/site_perl/5.30.2/Neo4j/Cypher/Abstract/Peeler.pm line 542.
Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.32), passed through in regex; marked by <-- HERE in m/^([:$?])|({ <-- HERE [^}]+}$)/ at /Users/bensonml/perl5/perlbrew/perls/perl-5.30.2/lib/site_perl/5.30.2/Neo4j/Cypher/Abstract/Peeler.pm line 120.
---A---
---B---
Can't call method "handle" on an undefined value at /Users/bensonml/0_SRC/bento-meta/perl/lib/Bento/Meta/Model/Edge.pm line 30.

Cannot save a model containing a 'tag' - `dput()` returns AttributeError

I can use bento_meta to create a tag for a node.
I cannot save the tag to the database; or, I cannot save the model after a tag is created - trying to execute dput() returns an Attribute Error:

Traceback (most recent call last):
  File "tag_issue.py", line 45, in <module>
    icdc_model.dput()
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/model.py", line 506, in dput
    do_(e)
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/model.py", line 481, in do_
    obj.dput()
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/entity.py", line 436, in dput
    return type(self).object_map.put(self)
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/object_map.py", line 229, in put
    for qry in ObjectMap(cls=type(val), drv=self.drv).put_q(val):
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/object_map.py", line 414, in put_q
    if getattr(obj, pr) is None:
  File "/Users/bensonml/.pyenv/versions/pyenv3.8/lib/python3.8/site-packages/bento_meta/entity.py", line 247, in __getattr__
    raise AttributeError(
AttributeError: get: attribute 'key' neither private nor declared for subclass Tag

Sample script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# demonstrate issue of saving a tag to db w/ dput()
import os
from neo4j import GraphDatabase
from bento_meta.model import Model
from bento_meta.objects import (
    Node,
    Property,
    Edge,
    Term,
    ValueSet,
    Concept,
    Origin,
    Tag,
)


def get_neo4j_db():
    uri = "bolt://localhost:7687"
    user = os.environ.get("NEO4J_MDB_USER")
    password = os.environ.get("NEO4J_MDB_PASS")

    ndbr = GraphDatabase.driver(uri, auth=(user, password))

    return ndbr


drv = get_neo4j_db()
icdc_model = Model("ICDC", drv)

# get data
print("getting from db")
icdc_model.dget()

# get case node, for example
case_node = icdc_model.nodes["case"]

# show that the tags are empty, as expected
# {}
print("tags before assignment")
print(case_node.tags)

# save model without changes - no problems expected
print("saving without changes")
icdc_model.dput()

# create and assign "core" as tag to 'case' node
tag_string = "core"
case_node.tags[tag_string] = Tag({"value": tag_string})

# show that 'case' node has tag, as expected
# {'core': <bento_meta.objects.Tag object at 0x107c7ae50>}
print("tags after assignment")
print(case_node.tags)

# Attempt to save db - now that model includes a tag
# AttributeError: get: attribute 'key' neither private nor declared for ...
print("saving to db")
icdc_model.dput()

# Never reached
print('done saving')

set_with_node doesn't access Node mapspec dict

patt = type(self).mapspec()['property'][att]

If the attribute "nanoid" is added to class Node(Entity)

    attspec = {
        "handle": "simple",
        "model": "simple",
        "category": "simple",
        "nanoid": "simple",
        "concept": "object",
        "props": "collection",
    }
    mapspec_ = {
        "label": "node",
        "key": "handle",
        "property": {"handle": "handle", "model": "model", "category": "category", "nanoid": "nanoid"},
        "relationship": {
            "concept": {"rel": ":has_concept>", "end_cls": "Concept"},
            "props": {"rel": ":has_property>", "end_cls": "Property"},
        },
    }

Then a KeyError is raised:

>           patt = type(self).mapspec()["property"][att]
E           KeyError: 'nanoid'

.tox/py3/lib/python3.7/site-packages/bento_meta/entity.py:227: KeyError

the mapspec will instead hold (note no nanoid)

{'key': '_id',
 'label': 'node',
 'property': {'_from': '_from',
              '_id': 'id',
              '_to': '_to',
              'category': 'category',
              'desc': 'desc',
              'handle': 'handle',
              'model': 'model'},

However, this problem does not appear to exist with other entities such as class Property(Entity), or class Edge(Entity)

Proof of workaround is in branch pep8. See https://github.com/CBIIT/bento-meta/blob/pep8/python/bento_meta/entity.py

            # WARNING: the node.mapspec()["property"] is NOT reading
            #          what is defined in objects.py class Node(Entity)
            #          can add 'nanoid' to attspec and mapspec_ but it will
            #          NOT show up here, not yet read?
            # Ergo, code needs to check that the attribute att actually
            # exists in the mapspec()["property"] dictionary, or else it will
            # raise KeyError here
            if att in (type(self).mapspec()["property"]):
                patt = type(self).mapspec()["property"][att]
                if patt in init:
                    setattr(self, att, init[patt])
                else:
                    setattr(self, att, None)

            # NOTE: 'one-off' munge to allow 'nanoid' attspec for Node
            # P.S.: don't let mom see I coded this...
            # TODO fix this!
            if (att == "nanoid" and type(self) == "Node"):
                patt = "nanoid"
                if patt in init:
                    setattr(self, att, init[patt])
                else:
                    setattr(self, att, None)

Add 'Model' as new Entity

Add Model as a class in objects.py - enable loaders to create these automatically (as "orphan" nodes)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.