GithubHelp home page GithubHelp logo

dhimmel / obonet Goto Github PK

View Code? Open in Web Editor NEW
135.0 9.0 28.0 116 KB

OBO-formatted ontologies → networkx (Python 3)

Home Page: https://github.com/dhimmel/obonet/blob/main/examples/go-obonet.ipynb

License: Other

Python 100.00%
obo ontology parser networkx python network rephetio obo-formatted-ontologies obo-files

obonet's Introduction

obonet: load OBO-formatted ontologies into networkx

GitHub Actions CI Build Status
Software License
PyPI

Read OBO-formatted ontologies in Python. obonet is

  • user friendly
  • succinct
  • pythonic
  • modern
  • simple and tested
  • lightweight
  • networkx leveraging

This Python package loads OBO serialized ontologies into networks. The function obonet.read_obo() takes an .obo file and returns a networkx.MultiDiGraph representation of the ontology. The parser was designed for the OBO specification version 1.2 & 1.4.

Usage

See pyproject.toml for the minimum Python version required and the dependencies. OBO files can be read from a path, URL, or open file handle. Compression is inferred from the path's extension. See example usage below:

import networkx
import obonet

# Read the taxrank ontology
url = 'https://github.com/dhimmel/obonet/raw/main/tests/data/taxrank.obo'
graph = obonet.read_obo(url)

# Or read the xz-compressed taxrank ontology
url = 'https://github.com/dhimmel/obonet/raw/main/tests/data/taxrank.obo.xz'
graph = obonet.read_obo(url)

# Number of nodes
len(graph)

# Number of edges
graph.number_of_edges()

# Check if the ontology is a DAG
networkx.is_directed_acyclic_graph(graph)

# Mapping from term ID to name
id_to_name = {id_: data.get('name') for id_, data in graph.nodes(data=True)}
id_to_name['TAXRANK:0000006']  # TAXRANK:0000006 is species

# Find all superterms of species. Note that networkx.descendants gets
# superterms, while networkx.ancestors returns subterms.
networkx.descendants(graph, 'TAXRANK:0000006')

For a more detailed tutorial, see the Gene Ontology example notebook.

Comparison

This package specializes in reading OBO files into a newtorkx.MultiDiGraph. A more general ontology-to-NetworkX reader is available in the Python nxontology package via the nxontology.imports.pronto_to_multidigraph function. This function takes a pronto.Ontology object, which can be loaded from an OBO file, OBO Graphs JSON file, or Ontology Web Language 2 RDF/XML file (OWL). Using pronto_to_multidigraph allows creating a MultiDiGraph similar to the created by obonet, with some differences in the amount of metadata retained.

The primary focus of the nxontology package is to provide an NXOntology class for representing ontologies based around a networkx.DiGraph. NXOntology provides optimized implementations for computing node similarity and other intrinsic ontology metrics. There are two important differences between a DiGraph for NXOntology and the MultiDiGraph produced by obonet:

  1. NXOntology is based on a DiGraph that does not allow multiple edges between the same two nodes. Multiple edges between the same two nodes must therefore be collapsed. By default, it only considers is a / rdfs:subClassOf relationships, but using pronto_to_multidigraph to create the NXOntology allows for retaining additional relationship types, like part of in the case of the Gene Ontology.

  2. NXOntology reverses the direction of relationships so edges go from superterm to subterm. Traditionally in ontologies, the is a relationships go from subterm to superterm, but this is confusing. NXOntology reverses edges so functions such as ancestors refer to more general concepts and descendants refer to more specific concepts.

The nxontology.imports.multidigraph_to_digraph function converts from a MultiDiGraph, like the one produced by obonet, to a DiGraph by filtering to the desired relationship types, reversing edges, and collapsing parallel edges.

Installation

The recommended approach is to install the latest release from PyPI using:

pip install obonet

However, if you'd like to install the most recent version from GitHub, use:

pip install git+https://github.com/dhimmel/obonet.git#egg=obonet

Contributing

GitHub issues

We welcome feature suggestions and community contributions. Currently, only reading OBO files is supported.

Develop

Some development commands:

# create virtual environment
python3 -m venv ./env

# activate virtual environment
source env/bin/activate

# editable installation for development
pip install --editable ".[dev]"

# install pre-commit hooks
pre-commit install

# run all pre-commit checks
pre-commit run --all

# run tests
pytest

# generate changelog for release notes
git fetch --tags origin main
OLD_TAG=$(git describe --tags --abbrev=0)
git log --oneline --decorate=no --reverse $OLD_TAG..HEAD

Maintainers can make a new release at https://github.com/dhimmel/obonet/releases/new.

obonet's People

Contributors

arvkevi avatar bgyori avatar cmungall avatar cthoyt avatar dhimmel avatar torstees avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obonet's Issues

pip insall on windows raises a UnicodeDecodeError

Platform: Windows 10 (64 bit)
Using python 3.6.1 during installation I get the following errors:

C:\Windows\system32>pip install obonet
Collecting obonet
  Using cached obonet-0.2.2.tar.gz
Exception:
Traceback (most recent call last):
  File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
    return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 13: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python36\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "c:\program files\python36\lib\site-packages\pip\commands\install.py", line 324, in run
    requirement_set.prepare_files(finder)
  File "c:\program files\python36\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "c:\program files\python36\lib\site-packages\pip\req\req_set.py", line 634, in _prepare_file
    abstract_dist.prep_for_dist()
  File "c:\program files\python36\lib\site-packages\pip\req\req_set.py", line 129, in prep_for_dist
    self.req_to_install.run_egg_info()
  File "c:\program files\python36\lib\site-packages\pip\req\req_install.py", line 439, in run_egg_info
    command_desc='python setup.py egg_info')
  File "c:\program files\python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
    line = console_to_str(proc.stdout.readline())
  File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
    return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 13: invalid continuation byte

other packages install normally

with python 2.7 everything installs without problems, but during runtime I get the following error:

Traceback (most recent call last):
  File "D:\Petrovskiy\Documents\Projects\Xpansa\is-anl\ANL-NERProcessor\src\ontologyExtraction\HP\test.py", line 2, in <module>
    import obonet
  File "C:\Program Files\Python27\lib\site-packages\obonet\__init__.py", line 1, in <module>
    from .read import read_obo
  File "C:\Program Files\Python27\lib\site-packages\obonet\read.py", line 6, in <module>
    from .io import open_read_file
  File "C:\Program Files\Python27\lib\site-packages\obonet\io.py", line 5, in <module>
    from urllib.request import urlopen
ImportError: No module named request

Parse failure when `name` key is used in ontology header

The latest version of OMRSE includes both the ontology and name key in its header. This breaks the following lines:

obonet/obonet/read.py

Lines 29 to 31 in 0ce7c81

graph = networkx.MultiDiGraph(
name=header.get("ontology"), typedefs=typedefs, instances=instances, **header
)

Minimal example:

>>> import obonet
>>> url = 'https://github.com/ufbmi/OMRSE/raw/master/omrse-full.obo'
>>> graph = obonet.read_obo(url)
Traceback (most recent call last):
  File "/Users/cthoyt/dev/obonet/examples/ex.py", line 4, in <module>
    obonet.read_obo(url)
  File "/Users/cthoyt/dev/obonet/obonet/read.py", line 26, in read_obo
    graph = networkx.MultiDiGraph(
TypeError: networkx.classes.multidigraph.MultiDiGraph() got multiple values for keyword argument 'name'

Possible solutions:

  • preemptively throw away name key from headers
  • reassign name from headers to another key like in
    if 'name' in headers:
       headers['long_name'] = headers.pop('name')

UnicodeDecodeError for obonet0.2.5

Hi,I am a new one to obonet and recently have an strict environment restriction for obonet0.2.5 and python3.7.9
As I try to install it I meet an UnicodeDecodeError such as shown belows:

(test2) C:\Users\dell>pip install obonet==0.2.5
Collecting obonet==0.2.5
  Using cached obonet-0.2.5.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\dell\AppData\Local\Temp\pip-install-fdy53bp6\obonet_fcba39b79d6c476f949856c09ef9088c\setup.py", line 18, in <module>
          long_description = read_file.read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 935: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

the environment test2 shown above is a brand new environment with python3.7.9 construted by anaconda,I've read the closed issues but it didn't make it with any pip updates,please help me with this embarrasing bug(O-r-z),thank you!

Extract replaced_by mappings for obsolete terms

Hi, I'd like to use obonet to read HPO terms (http://purl.obolibrary.org/obo/hp.obo)
I noticed that the obsolete terms are ignored when reading the file. I'd prefer to use their corresponding updated names.

Example:

[Term]
id: HP:0000547
name: obsolete Tapetoretinal degeneration
synonym: "Retinotapetal degeneration" EXACT []
is_obsolete: true
replaced_by: HP:0000510

Is there an easy way to extract and use that mapping?

[documentation] Name not always supplied for nodes

In the example in README.md
id_to_name = {id_: data['name'] for id_, data in graph.nodes(data=True)}
throws a KeyError if any node lacks a name (as some do for e.g. deprecated nodes)

id_to_name = {id_: data['name'] if 'name' in data else "<MISSING>" for id_, data in graph.nodes(data=True)}

fixes this problem.

(or else None or whatever)

Installation on python2 without pandoc

Travis CI is unable to install obonet on python2. I don't have pandoc in my builds. Here's a link to my build log: https://travis-ci.org/pybel/pybel-tools/jobs/239611981

However, it is working on python3. Could this be because of the use of print function without importing from __future__ import print_function in setup.py?

Alternatively, it could be the ordering of directory = os.path.dirname(os.path.abspath(__file__)) on line 8 of setup.py instead of directory = os.path.abspath(os.path.dirname(__file__))

Edit: also, I apologize for even bringing up the idea of python2 support. I'd rather not give python2 support in my own software, but it's been easy enough until now by avoiding just a few things.

Use of OWL files

Hey, is there a way to load OWL files into the graph? If not, are you aware of any roundabout that can help covert this OWL file into OBO as certain ontologies in OBO Foundry have links to just OWL files and not corresponding OBO files such as this

Save as an obo file

Hi,

I would like is it possible to save a graph as obo file using obonet?
I load an obo file and change some part pf graph, I want to save it as a new obo file.

best
Mahboubeh

obonet says GO is not a DAG

Hi Daniel,

I just downloaded and used obonet (thanks by the way for making and posting it!) and executed the code:

graph = obonet.read_obo('../data/go.obo')
print(networkx.is_directed_acyclic_graph(graph))

And for some reason it comes back FALSE. Should I be concerned? Any idea why GO is not a DAG according to networkx?

Thanks for any advice you can give!

Collaboration on obo parser and data model

@dhimmel

I took the liberty of seeing what it would take to implement obonet with my obo package:

import networkx
from obo import Ontology

def read_obo(path_or_file):
    #with open_read_file(path_or_file) as fp:#
    with open(path_or_file, 'r') as fp:
        ontology = Ontology.read(fp)

    graph = networkx.MultiDiGraph(
        name=ontology.ontology,
        typedefs=ontology.typedefs,
        instances=ontology.instances,
        **ontology.tags)

    edge_tuples = []
    for term in ontology.terms:
        if term.is_obsolete:
            continue

        graph.add_node(term)

        for target_term in term.is_a:
            edge_tuples.append((term, 'is_a', ontology.terms[target_term]))

        for relationship in term.relationships:
            edge_tuples.append((term, relationship.type, ontology.terms[relationship.target_term]))

    for term_a, type_, term_b in edge_tuples:
        graph.add_edge(term_a, term_b, key=type_)

    return ontology, graph

The lookup I wanted to do checking if X is_a "sequence_feature" in SO. Ignoring for a minute that there are other relationships than "is_a" (as @cmungall pointed out), the lookup would be something like:

ontology, graph = read_obo('tests/files/so-xp.obo')

exon = ontology.term_by_name('exon')
sequence_feature = ontology.term_by_name('sequence_feature')

print(networkx.descendants(graph, exon))  # {<Term SO:0000001 ! region>, <Term SO:0000704 ! gene>, ...}
print(sequence_feature in networkx.descendants(graph, exon))  # True

If it would be interesting to you to collaborate on the parser and data model, let me know. It would be good not to be the only maintainer of the parser and for there to be a package that makes use of it. On the other hand I also understand if you prefer to do your own thing.

Adding support for encoding in read_obo / UnicodeDecodeError for Obonet0.3.1 (Cell Ontology)

Similar to #25 , I am having a UnicodeDecodeError. The solution there is to upgrade to v0.3.0, but as far as I can tell, the setup.py still doesn't specify an encoding?

I have my OBO file locally (I downloaded the cl.obo file from https://obofoundry.org/ontology/cl.html)

import networkx
import obonet

path = "./data/ontology/cl.obo"
graph = obonet.read_obo(path)

obonet==0.3.1
networkx==3.0

Without running a local version of obonet with the encoding specified, how can I best resolve this error?

Any chance to add support for specifying an encoding in read_obo?

Bug: Can't load Brenda Tissue Ontology

>> bto = obonet.read_obo('http://data.bioontology.org/ontologies/BTO/submissions/33/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-8-b0207a3a5973> in <module>()
----> 1 bto = obonet.read_obo('http://data.bioontology.org/ontologies/BTO/submissions/33/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb')

/usr/local/lib/python3.5/dist-packages/obonet/read.py in read_obo(path_or_file)
     25     obo_file.close()
     26     graph = networkx.MultiDiGraph(
---> 27         name=header['ontology'],
     28         typedefs=typedefs,
     29         instances=instances,

KeyError: 'ontology'

This is the same if I download the file and use it locally

Upload the package to PyPI

It would be nice to have this library uploaded to pypi in order to make it really easy to install.
Keep up the good work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.