jupyrdf / ipyradiant Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 7.0 26.66 MB

Jupyter widgets for working with RDF graphs.

License: BSD 3-Clause "New" or "Revised" License

Batchfile 0.18% Python 97.76% Shell 0.22% Jupyter Notebook 1.85%

ipyradiant's People

Contributors

Stargazers

Watchers

Forkers

nrbgt lnijhawan zwelz3 rhythmsyed sanbales dfreeman06 pattersoniv

ipyradiant's Issues

Extend SPARQLQueryFramer to support query files

The current implementation of the SPARQLQueryFramer class requires the user to implement a sparql string attribute, which is used to perform the query.

This ticket scopes in the addition of a way to point the framer to a sparql file, instead of providing a sparql string.

node selection cosmetics, interactivity

cytoscape doesn't appear to highlight what nodes are actually selected. this can probably be fixed with a selector in their style language. Ideally, we would hoist these to the Base, as e.g. selected_node_color.

Further, both support multiple selection modes: box select is probably the best default, as graphs are big, unless it messes with the pan/zoom interaction. This could also be hoisted as a box_select, or seelction_mode, etc.

Checking if object of triple is a literal when creating edges

As of right now in the code implementing within the cytoscape visualization the code is checking to make sure the object of the triple is not a literal before adding it to the group of edges, do we want this behavior? or do we want triples which have literals as their objects also?

🚀 Release 0.1.0 (H_e) 📦

Master issue for the first release of ipyradiant.

TODO

Support JupyterLab 3

It's worth a look to see which (if any) of the upstreams haven't been released to support lab3. Going to 3 would remove node/npm entirely from this repo for everything except prettier, which we can vendor, and would improve both development time and potentially CI.

real dependencies

extension	status	link
@jupyter-widgets/jupyterlab-manager	ready	https://pypi.org/project/jupyterlab-widgets/1.0.0/
@pyviz/jupyterlab_pyviz	ready	https://pypi.org/project/pyviz-comms/2.0.1/
jupyter-cytoscape	ready	https://pypi.org/project/ipycytoscape/1.2.0/
qgrid2	dead	---

what to do about qgrid

alternatives
- ipyregulartable
  - support merged, probably releasing real soon now
- wxyz-datagrid
  - not started yet

dev/docs dependencies

extension	status	link
@deathbeds/[email protected]	not started	deathbeds/ipydrawio#11
@deathbeds/[email protected]	not started	deathbeds/ipydrawio#11
@deathbeds/[email protected]	not started	deathbeds/ipydrawio#11

Datashader Callbacks

As of right now we can hover over nodes and edges in datashader but are unable to send selected node/edge information to the backend like we are with ipycytoscape. Looking for a way that we can click on the datashader objects and get the callbacks to the backend so we can do something with the information.

Cytoscape visualizations that are additive vs replace data

Adding data to the Cytoscape widget graph object is an additive process (it does not clear the previous graph). This may be something we want in a future visualization, but the current desired effect is to replace the data. A method for this replacement implementation exists, but a widget flag to capture the additive behavior as an option may be worthwhile.

Prevent remote query examples from running during testing

Currently the remote query examples access URLs such as DBPedia.org. This can impact testing when services are unavailable.

Ticket scopes in the prevention of this code from running during testing/CI. Maybe an env variable?

Add some Roadmap Mockups

To help inform what our individual components should look like, let's use an open source tool to show some ideas.

inkscape is always good
jupyterlab-drawio would be a good one, as it can write to svg

Expand XSD2PY Map

#50 implements a mapping from XML Standard Datatypes (XSD) to python for most of the SPARQL supported types.

This ticket expands the mapper to include missing types (e.g. XSD.byte and XSD.dateTime).

Blocked by #50

Need tests for basic parsing of fixture files using rdflib.

rdflib has a number of parsers see this link but not all are installed in rdflib by default. The guess_format for graph loading allows all the parsers to be references even if they are not installed.

This ticket scopes in a set of basic tests to ensure that fixtures exist for each format, and that the parsers are checked to ensure we know exactly what file formats our load widget should except.

Any missing fixture formats should also be added (e.g. n3).

Reorganize Examples

Examples folder is getting a bit cluttered.

Lets reorg to separate the tests from the examples, create a "tests" sandbox folder.

Also, remove any examples we don't need anymore.

Hiveplots

It would be interesting to experiment with hiveplots. These have the advantage over (even deterministic) hairball diagrams in that they have a stable shape, and can be compared visually.

gratuitous poster

use case

as a new user of an ontology, i'd like to understand the class hierarchy and use of predicates

For example, with the schema.org examples, one might want a:

a0: an axis of types
a1: an axis of subtypes
a2: an axis of predicates

with:

a1 -> a0 for subclassOf
a1 -> a2 for range of predicate
a2 -> a0 for domain of domain

implementation ideas

There is hive_networkx, and presumably one could reuse everything except the matplotlib and drive it off datashader (maybe a pain to get the axis lines in 🤷 ).

This would extend the existing stuff, and provide a query/graph/etc to determine the axes to which each thing belongs.

Improved Cytoscape Visualization

Ticket to improve the cytoscape visualization to go along with the new RDF2NX converter. Things to be included:

Ability to start with a single node and expand that node to show all connections
More customized formatting for nodes in cytoscape graphs
View JSON data present on a node

Investigate RDF -> LPG generalized collapsing process

Continuation of #42.

Previous efforts have identified a basic process for collapsing a set of predicates onto their subject nodes in order to transform an RDF graph into an LPG graph. This ticket expands upon that process and attempts to identify relevant patterns for higher-order collapsing (e.g. more predicates).

Scope (completion conditions):

Improve QueryWidget Grid and Layout

The preliminary QueryWidget was satisfactory for early release, but should be improved to support common adoption.

This ticket scopes in:

flexible layout (auto-update the number of rows)
improvements on the ipywidgets.Output for the query results (QueryWidget.grid).

We removed the auto/flex width in favor of something with a more predictable behavior.
The Textarea height adjustment (in the UI) cannot be disabled, but clicking it breaks our resize capability. This is unavoidable until the css is available to disable it in ipywidgets. See this issue for tracking.

We decided that the graph and query text namespaces should be kept separate. Users will note that they can use namespaces that are already bound within the widget graph, or they can define the prefixes in the query body. A future update may include an additional namespace argument to aid in automation, but for now these two methods are deemed sufficient.

Closed by #102

Allow multiple graphs to be selected via the FileUpload

Need the capability to allow for multiple graphs to be selected and then added together into one graph

Make different visualization providers optional

Looks like scipy vs windows vs conda-forge is not happy. We probably need to:

change the dependencies to use extras
make multiple outputs, e.g.
- ipyradiant-with-cytoscape
- ipyradiant-with-datashader
don't hoist the imports all the way to the top, requiring import ipyradiant.vis.cytoscape, ipyradiant.vis.datashader
bump to 0.2.0

Address inconsistency in namespace for RDF2NX process

Why does overwriting the sparql attribute of a query class cause the query to fail if a PREFIX isn't specified? The namespace may be in initNs or graph.namespaces, but the process doesn't recognize them.

Example:

# this doesn't work without PREFIX
RDF2NX.node_properties.sparql = """
PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?iri ?predicate ?value
{
  ?iri ?predicate ?value.
  
  VALUES (?predicate){
      (rdfs:label)
      (voc:starship)
  }
}
"""

Generalize Graph provider

It will likely be useful to abstract out references to rdflib.Graph which appear throughout and provide a more-or-less abstract GraphStore widget. The fallback implementation would probably still be an in-memory RdflibGraphStore (probably backed by ConjunctiveGraph) but perhaps to determine what is needed, a VirtuosoGraphStore virtuoso-opensource would be a reasonable first step.

It's also probably worth thinking about heavy operations like query and load as an async operation... i'm pretty sure rdflib.Graph isn't threadsafe, so it would have to use a ThreadPoolExecutor(1) with some kind of debouncing/throttling. Virtuoso on the other hand, can handle many simultaneous queries, so either wrapping it or finding an async client would work.

Since these might carry multiple flavors of potentially tightly-pinned dependencies, we'd either want them to be individual packages, or add an extras section to setup.py.

Metaclass for SPARQLQueryFramer VALUES block

VALUES is a common pattern for specifying multiple bindings (values) to a single variable. The SPARQLQueryFramer class works off of a sparql string attribute, which makes it challenging to define VALUES in a pythonic way.

This ticket scopes in the inclusion of a metaclass capability to capture a dictionary of values that are converted into a formatted sparql query at runtime. Due to its relative complexity, this ticket should also include sufficient documentation and examples of how to use the metaclasses to perform queries with VALUES.

🌞 Release 0.1.1 (J_e_v) 📦

Blockers:

Follow-on:

bump version
resolve envs

Linked Multi Select Widget

A linked multi select widget that can be used to down-select predicates that a user wishes to collapse.

🌟 Screenshots

Long-running issue to capture ipyradiant screenshots/screencasts.

RDF2NX Custom Processing

Develop an example and tests for the RDF2NX custom processing.

Investigate SPARQL Queries for collapsing nodes during RDF to networkx transformation

We are interested in using networkx to perform complex graph algorithms, and already use it to build visualizations. As RDF represents all data as nodes, we will want to collapse down DataProperties onto the nodes that are created in the networkx graph.

This is the first ticket towards that capability that investigates the SPARQL queries needed to perform the collapsing process.

Handle 1..N queries for converter.

Monolithic queries are difficult to define, challenging to maintain, and in some cases less performant than multiple separate queries.

This ticket investigates a way to allow for N number of queries of each type for the ipyradiant.rdf2nx.RDF2NX converter. This would allow users to specify multiple queries that satisfy a stage of the process (e.g. NodeIRIs). The process should use nx.compose to aggregate results and pass them through the normal RDF2NX process.

Blocked by #50/#58

Capability to pass in dictionary OR rdflib.namespace.NamespaceManager to selection widget

We want to be able to collapse the representation of a URIRef into a truncated form based on either a dictionary passed by a user or also a namespacermanager object. Need to develop examples for both as well. Blocked by PR#46

Need fixture data for examples and future tests

Need a basic example ttl file that we can use in the notebook examples and future tests. This can be replaced with a better process/example later, but we need something in the short-term.

Add unit testing for SPARQLQueryFramer

Need a more comprehensive set of unit tests for the SPARQLQueryFramer class.

This should include a set of tests for basic metaclass designs that work with the framer.

Investigating jsonld for URI collapsing

When looking to truncate the display of a URIRef but still gain access to the data associated with said term... need to investigate jsonld as an implementation of this feature.

Also, investigate use of CSS to truncate the URIRef names. For example, class RDFCSSLabeler(HTML) that is attached to a style sheet and can then automatically represent RDF URIRefs as their truncated form.

Bring in ipyelk as a dependency

We will need ipyelk for visualizations.

Once jupyterlab releases version 3.0 and ipyelk cuts a new release we should bring it into ipyradiant.

Proper way to process namespace information

This ticket is interested in a common process for handling namespaces.

Currently, namespaces must be defined like so:

initNs = {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "res": "https://swapi.co/resource/",
    "voc": "https://swapi.co/vocabulary/",
    "base": "https://swapi.co/resource/",
}

We should be able to support the values being URIRef and Namespace as well. The entire initNs should also support being a NamespaceManager.

We need a common way to process the namespace (maybe a util) so that queries and other methods can use the namespace object without worry.

Add screencast to the README

Lets save ourselves some data complexity in the repo and hosting headaches and use screencast for README examples.

The downside is that they will not work offline, but I think that is okay.

I'm including an initial version, which does not include any of the ipycytoscape visuals yet.

Update Deprecated API in starwars.ttl from examples/

The starwars data source API https://swapi.co/ is deprecated and needs to be updated to https://swapi.dev.

The API is also invoked differently:

For a resource like this: https://swapi.co/resource/human/11
Now has to be called like this: https://swapi.dev/api/people/11
This is referencing Anakin Skywalker

The file is here: https://github.com/jupyrdf/ipyradiant/blob/master/examples/data/starwars.ttl

Proper logging for file loading and query execution

The current tool does not capture error information to logging in a consistent fashion (via Output).

Migrate tests

The current tests are located under examples/tests, which is not great for indexing (e.g. pytest).

This ticket should move them into a standard location and update any links (pytest, doit, etc).

Strip out more metadata, whitespace in notebooks

Let's configure nblint to only allow certain metadata so that review isn't more painful than it needs to be.

There's also still some whitespace issues, not sure why blank/prettier aren't handling them

Generalize Query Building UI

A number of UI could be brought to bear on the direct or indirect construction and visualization of SPARQL queries. As long as the interface is...

Changing QueryBuilder.sparql will eventually update QueryBuilder.results

...pretty much any of the below concepts would work. Most of them would require bespoke labextension development, or at least significantly more dependencies and potential install complexity than what we have now.

text based

Editor from wxyz
- still has some rough edges, but at least getting syntax highlighting would help
graphql-to-sparql
- this could leverage things like graphiql or [graphql-playground]- sparqlblocks (demo)
- this would allow a friendlier approach for those that enjoy the visual programming metaphor
  (https://github.com/prisma-labs/graphql-playground)
sparql-language-server and jupyterlab-lsp
- while this would eventually be useful, there is presently no Lab widget adapter for anything other than Editor and Notebook, and the API is not (partially intentionally) not particularly amenable to extension in this way

data-based

SPARQL.js
- moving from a somewhat quirky standard to a a more broadly-distributed data format
- if we could take a (large number) of sparql queries and generate a JSON schema, we could then more or less automatically generate/validate it with rjsf (by way of wxyz.JsonSchemaForm) or equivalent tools

visual programming

vsb
- looks solid, but kinda old, might be able to be resurrected
draw.io
- while a rather daunting API, some good strides have been made to do basic embedding of this into Lab. Defining a "proper" visual query language is really hard, but it has a lot of potential, and looks sharp

query-by-example

graph-pattern-learner
- also a bit long in the tooth (may well not work with the most recent rdflib), this would offer the user two tables of gozintas and gozoutas, burn a little coal, and then show the "answer" without much intervention, with the intermediate SPARQL as a byproduct, which could then be visualized

InteractiveViewer Bug Fixes

Placeholder ticket for tracking issues with the InteractiveViewer widget.

directed edges may be incorrect when inferred from expanded nodes
edges between nodes should be added when the graph is populated initially
new nodes should be passed to RDF2NX
investigate flashing changes to layout
investigate issues with front-end updates not triggering (forcing children to be re-specified or node create/delete).

Allow querying of remote SPARQL endpoints

rdflib includes a package for querying remote SPARQL endpoints. This would be a nice capability for the tool instead of loading a file, or passing in a graph object.

https://github.com/RDFLib/sparqlwrapper

Integrate initial graph explorer capabilities

With #61 introducing the GraphExploreNodeSelection, and #68 introducing the InteractiveViewer, this ticket scopes in the integration of these capabilities into a unified GraphExplorer widget.

The widget should support the selection of nodes, their population in the viewer, and the successful execution of basic workflows for graph exploration. Selected nodes should have their data displayed via ipywidgets.Output using IPython.display.JSON.

Blocked by #61 #68

Not having to manually set children again in `InteractiveViewer`

In the InteractiveViewer class, we currently are manually resetting children so that the changes propogate to front-end. It would be nice to have this done automatically via traitlets.

Add index page for notebook examples

For now, will use normal notebook hyperlinks to develop an index landing page that categorizes examples.
In the future, may want to move to something more like Jupyter{book}.

Improve testing for the InteractiveViewer class.

At the moment, we use some 'magic numbers' for this class. In the future this needs to be expanded to be less hard-coded and more dynamic in case the Star Wars API itself changes in the future.

Replace drawio submodule with published packages

The npm packages for @deathbeds/jupyterlab-drawio are up:

the good

jupyter labextension install \
  @deathbeds/[email protected] \
  @deathbeds/[email protected]

Installing these directly would replace the use of the enormous, nested submodules, and make things snappier and more predictable, but crucially less complex.

the bad

The pdf is still not really shippable (random npm/browser installs), and would require a pip dependency, so may not be worth it for now. I've also found that some complex shapes (e.g. the cloud icons) don't embed properly.

conda install -yc conda-forge requests_cache pypdf2
pip install [email protected]
jupyter labextension install @deathbeds/[email protected]

the ugly

Notebooks work, but are separated out:

jupyter labextension install @deathbeds/[email protected]

This is mostly because I have a lot of stuff planned there. Also, the notebook metadata tag changed for .dio.ipynb to add the @deathbeds namespace, but is otherwise compatible.

Make `RDF2NX` converter support parallelization

The general process of the RDF2NX converter is designed to support parallelization. This ticket investigates how to improve the RDF2NX class so that functions can be executed in parallel.

This likely requires the internal queries (e.g. NodeTypes) to be pulled out of the higher-level methods (e.g. transform_nodes).

Blocked by #50 and #58

Initial Stats data and widgets

When processing/loading RDF data, it is often informative to have basic stats on the data itself (e.g. # of triples). This ticket scopes in the development of a basic statistics class where stats are driven/exposed by queries and stored in basic tables. Basic stats may include:

stat	#
triples	123
subjects	34
predicates	12
objects	45
prefixes	3

Stats would be pluggable, e.g. we may have an OWLStats class that understands how to query for e.g. ObjectProperty Axioms.

Make networkx version pin more accurate, layout loading more lenient

We unconditionally do a hard import of a random selection of networkx layouts that happened to be in networkx 2.4. These evolve over time, such as planar, added in 2.3 and multipartite, added in 2.5.

We should probably:

🌞 Release 0.1.2 J_e 📦

Incremental improvements over 0.1.1