GithubHelp home page GithubHelp logo

biopragmatics / semra Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 1.0 917 KB

๐Ÿ›ฃ๏ธ Semantic Mapping Reasoning Assembler (SeMRA): tooling for semantic mappings

Home Page: https://semra.readthedocs.io

License: MIT License

Python 5.42% Jupyter Notebook 94.24% HTML 0.34%
biopragmatics mappings ontology-merging semantic-mappings

semra's Introduction

SeMRA

Tests PyPI PyPI - Python Version PyPI - License Documentation Status Codecov status Cookiecutter template from @cthoyt Code style: black Contributor Covenant

Semantic mapping reasoner and assembler

DOI

This software provides:

  1. An object model for semantic mappings (based on SSSOM)
  2. Functionality for assembling and reasoning over semantic mappings at scale
  3. A provenance model for automatically generated mappings
  4. A confidence model granular at the curator-level, mapping set-level, and community feedback-level

We also provide an accompanying raw semantic mapping database on Zenodo at https://zenodo.org/records/11082039.

๐Ÿš€ Installation

The most recent release can be installed from PyPI with:

pip install semra

The most recent code and data can be installed directly from GitHub with:

pip install git+https://github.com/biopragmatics/semra.git

๐Ÿ‘ Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

๐Ÿ‘‹ Attribution

โš–๏ธ License

The code in this package is licensed under the MIT License.

๐Ÿช Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

๐Ÿ› ๏ธ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

git clone git+https://github.com/biopragmatics/semra.git
cd semra
pip install -e .

๐Ÿฅผ Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

๐Ÿ“– Building the Documentation

The documentation can be built locally using the following:

git clone git+https://github.com/biopragmatics/semra.git
cd semra
tox -e docs
open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

The documentation can be deployed to ReadTheDocs using this guide. The .readthedocs.yml YAML file contains all the configuration you'll need. You can also set up continuous integration on GitHub to check not only that Sphinx can build the documentation in an isolated environment (i.e., with tox -e docs-test) but also that ReadTheDocs can build it too.

๐Ÿ“ฆ Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/semra/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

semra's People

Contributors

bgyori avatar cthoyt avatar kkaris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

kkaris

semra's Issues

Automated evaluation of predicted mappings

Given a set of mappings, they can be stratified on three axes:

  1. mapping justification (semapv:ManualMappingCuration and maybe semapv:UnspecifiedMatching vs. others like semapv:LexicalMatching, semapv:BackgroundKnowledgeBasedMatching, semapv:MappingInversion, and semapv:MappingChaining)
  2. mapping set
  3. source-target prefix pair

Some ideas on making a comparison in order of increasing complexity

  1. Compare mapping overlap to get a simple precision, recall, and $F_1$
  2. Penalize incorrect 1-many, many-1, and many-to-many mappings
  3. Incorporate ontology hierarchy

Require that inference does not create paths that have the same prefix in them twice

we got burned by the HP mapping from its specific neoplasms (e.g., neoplasm of the rectum; https://hpo.jax.org/app/browse/term/HP:0100743) to NCIT's high-level neoplasm term (NCIT:C3262), so some paths go through HPO's neoplasm terms (that are specific) then to NCIT, then back to some other HPO term, which means that you can basically call any neoplasms equivalent.

here's an example showing this

hp:0010788      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             doid:1984 umls:C0034885 mesh:D012004 hp:0100743 ncit:C3262 hp:0010788
hp:0011750      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             doid:1984 umls:C0034885 mesh:D012004 hp:0100743 ncit:C3262 hp:0011750
hp:0011752      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             hp:0011752 ncit:C3262 hp:0100743 umls:C0034885 ncit:C3350 doid:1984
hp:0012289      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             hp:0012289 ncit:C3262 hp:0100743 umls:C0034885 ncit:C3350 doid:1984
hp:0012720      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             doid:1984 umls:C0034885 mesh:D012004 hp:0100743 ncit:C3262 hp:0012720
hp:0012777      skos:exactMatch doid:1984       semapv:MappingChaining  doid,hp,umls                    1.0             hp:0012777 ncit:C3262 hp:0100743 umls:C0034885 ncit:C3350 doid:1984
  1. Need to make sure we don't incorporate many to many relations
    • technically the priority mapping only forces the subjects to be unique, but there should be some intermediate where we say that for each object, there can only be one subject from each namespace that maps to it.
  2. Need to filter out evidence paths that have two conepts from the same namespace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.