GithubHelp home page GithubHelp logo

jupyrdf / ipyradiant Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 7.0 26.66 MB

Jupyter widgets for working with RDF graphs.

License: BSD 3-Clause "New" or "Revised" License

Batchfile 0.18% Python 97.76% Shell 0.22% Jupyter Notebook 1.85%

ipyradiant's People

Contributors

dfreeman06 avatar lnijhawan avatar nrbgt avatar rhythmsyed avatar zwelz3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ipyradiant's Issues

Extend SPARQLQueryFramer to support query files

The current implementation of the SPARQLQueryFramer class requires the user to implement a sparql string attribute, which is used to perform the query.

This ticket scopes in the addition of a way to point the framer to a sparql file, instead of providing a sparql string.

node selection cosmetics, interactivity

cytoscape doesn't appear to highlight what nodes are actually selected. this can probably be fixed with a selector in their style language. Ideally, we would hoist these to the Base, as e.g. selected_node_color.

Further, both support multiple selection modes: box select is probably the best default, as graphs are big, unless it messes with the pan/zoom interaction. This could also be hoisted as a box_select, or seelction_mode, etc.

Checking if object of triple is a literal when creating edges

As of right now in the code implementing within the cytoscape visualization the code is checking to make sure the object of the triple is not a literal before adding it to the group of edges, do we want this behavior? or do we want triples which have literals as their objects also?

Support JupyterLab 3

It's worth a look to see which (if any) of the upstreams haven't been released to support lab3. Going to 3 would remove node/npm entirely from this repo for everything except prettier, which we can vendor, and would improve both development time and potentially CI.

real dependencies

extension status link
@jupyter-widgets/jupyterlab-manager ready https://pypi.org/project/jupyterlab-widgets/1.0.0/
@pyviz/jupyterlab_pyviz ready https://pypi.org/project/pyviz-comms/2.0.1/
jupyter-cytoscape ready https://pypi.org/project/ipycytoscape/1.2.0/
qgrid2 dead ---

what to do about qgrid

dev/docs dependencies

extension status link
@deathbeds/[email protected] not started deathbeds/ipydrawio#11
@deathbeds/[email protected] not started deathbeds/ipydrawio#11
@deathbeds/[email protected] not started deathbeds/ipydrawio#11

Datashader Callbacks

As of right now we can hover over nodes and edges in datashader but are unable to send selected node/edge information to the backend like we are with ipycytoscape. Looking for a way that we can click on the datashader objects and get the callbacks to the backend so we can do something with the information.

Cytoscape visualizations that are additive vs replace data

Adding data to the Cytoscape widget graph object is an additive process (it does not clear the previous graph). This may be something we want in a future visualization, but the current desired effect is to replace the data. A method for this replacement implementation exists, but a widget flag to capture the additive behavior as an option may be worthwhile.

Expand XSD2PY Map

#50 implements a mapping from XML Standard Datatypes (XSD) to python for most of the SPARQL supported types.

This ticket expands the mapper to include missing types (e.g. XSD.byte and XSD.dateTime).

Blocked by #50

Need tests for basic parsing of fixture files using rdflib.

rdflib has a number of parsers see this link but not all are installed in rdflib by default. The guess_format for graph loading allows all the parsers to be references even if they are not installed.

This ticket scopes in a set of basic tests to ensure that fixtures exist for each format, and that the parsers are checked to ensure we know exactly what file formats our load widget should except.

Any missing fixture formats should also be added (e.g. n3).

Reorganize Examples

Examples folder is getting a bit cluttered.

Lets reorg to separate the tests from the examples, create a "tests" sandbox folder.

Also, remove any examples we don't need anymore.

Hiveplots

It would be interesting to experiment with hiveplots. These have the advantage over (even deterministic) hairball diagrams in that they have a stable shape, and can be compared visually.

gratuitous poster

use case

as a new user of an ontology, i'd like to understand the class hierarchy and use of predicates

For example, with the schema.org examples, one might want a:

  • a0: an axis of types
  • a1: an axis of subtypes
  • a2: an axis of predicates

with:

  • a1 -> a0 for subclassOf
  • a1 -> a2 for range of predicate
  • a2 -> a0 for domain of domain

implementation ideas

There is hive_networkx, and presumably one could reuse everything except the matplotlib and drive it off datashader (maybe a pain to get the axis lines in ๐Ÿคท ).

This would extend the existing stuff, and provide a query/graph/etc to determine the axes to which each thing belongs.

Improved Cytoscape Visualization

Ticket to improve the cytoscape visualization to go along with the new RDF2NX converter. Things to be included:

  • Ability to start with a single node and expand that node to show all connections
  • More customized formatting for nodes in cytoscape graphs
  • View JSON data present on a node

Investigate RDF -> LPG generalized collapsing process

Continuation of #42.

Previous efforts have identified a basic process for collapsing a set of predicates onto their subject nodes in order to transform an RDF graph into an LPG graph. This ticket expands upon that process and attempts to identify relevant patterns for higher-order collapsing (e.g. more predicates).

Scope (completion conditions):

  • Survey of open-source RDF->LPG capabilities
  • Initial implementation in ipyradiant
    • Basic query steps
      • NodeIRIs
      • NodeTypes
      • NodeProperties
      • RelationTypes
      • ReifiedRelations
      • RelationProperties
      • Singletons (skipping for now)
    • Examples of common patterns for each query (skipping for now)
    • Literal converters
  • Better understanding of common collapsing operations (multi-edges, disparate data, etc.)
  • Considerations on ways to visualize what is happening in the collapsed graph
  • Documentation and notebook clean-up

Improve QueryWidget Grid and Layout

The preliminary QueryWidget was satisfactory for early release, but should be improved to support common adoption.

This ticket scopes in:

  • flexible layout (auto-update the number of rows)
  • improvements on the ipywidgets.Output for the query results (QueryWidget.grid).

We removed the auto/flex width in favor of something with a more predictable behavior.
The Textarea height adjustment (in the UI) cannot be disabled, but clicking it breaks our resize capability. This is unavoidable until the css is available to disable it in ipywidgets. See this issue for tracking.

We decided that the graph and query text namespaces should be kept separate. Users will note that they can use namespaces that are already bound within the widget graph, or they can define the prefixes in the query body. A future update may include an additional namespace argument to aid in automation, but for now these two methods are deemed sufficient.

Closed by #102

Make different visualization providers optional

Looks like scipy vs windows vs conda-forge is not happy. We probably need to:

  • change the dependencies to use extras
  • make multiple outputs, e.g.
    • ipyradiant-with-cytoscape
    • ipyradiant-with-datashader
  • don't hoist the imports all the way to the top, requiring import ipyradiant.vis.cytoscape, ipyradiant.vis.datashader
  • bump to 0.2.0

Address inconsistency in namespace for RDF2NX process

Why does overwriting the sparql attribute of a query class cause the query to fail if a PREFIX isn't specified? The namespace may be in initNs or graph.namespaces, but the process doesn't recognize them.

Example:

# this doesn't work without PREFIX
RDF2NX.node_properties.sparql = """
PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?iri ?predicate ?value
{
  ?iri ?predicate ?value.
  
  VALUES (?predicate){
      (rdfs:label)
      (voc:starship)
  }
}
"""

Generalize Graph provider

It will likely be useful to abstract out references to rdflib.Graph which appear throughout and provide a more-or-less abstract GraphStore widget. The fallback implementation would probably still be an in-memory RdflibGraphStore (probably backed by ConjunctiveGraph) but perhaps to determine what is needed, a VirtuosoGraphStore virtuoso-opensource would be a reasonable first step.

It's also probably worth thinking about heavy operations like query and load as an async operation... i'm pretty sure rdflib.Graph isn't threadsafe, so it would have to use a ThreadPoolExecutor(1) with some kind of debouncing/throttling. Virtuoso on the other hand, can handle many simultaneous queries, so either wrapping it or finding an async client would work.

Since these might carry multiple flavors of potentially tightly-pinned dependencies, we'd either want them to be individual packages, or add an extras section to setup.py.

Metaclass for SPARQLQueryFramer VALUES block

VALUES is a common pattern for specifying multiple bindings (values) to a single variable. The SPARQLQueryFramer class works off of a sparql string attribute, which makes it challenging to define VALUES in a pythonic way.

This ticket scopes in the inclusion of a metaclass capability to capture a dictionary of values that are converted into a formatted sparql query at runtime. Due to its relative complexity, this ticket should also include sufficient documentation and examples of how to use the metaclasses to perform queries with VALUES.

Linked Multi Select Widget

A linked multi select widget that can be used to down-select predicates that a user wishes to collapse.

Investigate SPARQL Queries for collapsing nodes during RDF to networkx transformation

We are interested in using networkx to perform complex graph algorithms, and already use it to build visualizations. As RDF represents all data as nodes, we will want to collapse down DataProperties onto the nodes that are created in the networkx graph.

This is the first ticket towards that capability that investigates the SPARQL queries needed to perform the collapsing process.

Handle 1..N queries for converter.

Monolithic queries are difficult to define, challenging to maintain, and in some cases less performant than multiple separate queries.

This ticket investigates a way to allow for N number of queries of each type for the ipyradiant.rdf2nx.RDF2NX converter. This would allow users to specify multiple queries that satisfy a stage of the process (e.g. NodeIRIs). The process should use nx.compose to aggregate results and pass them through the normal RDF2NX process.

Blocked by #50/#58

Need fixture data for examples and future tests

Need a basic example ttl file that we can use in the notebook examples and future tests. This can be replaced with a better process/example later, but we need something in the short-term.

Add unit testing for SPARQLQueryFramer

Need a more comprehensive set of unit tests for the SPARQLQueryFramer class.

This should include a set of tests for basic metaclass designs that work with the framer.

Investigating jsonld for URI collapsing

When looking to truncate the display of a URIRef but still gain access to the data associated with said term... need to investigate jsonld as an implementation of this feature.

Also, investigate use of CSS to truncate the URIRef names. For example, class RDFCSSLabeler(HTML) that is attached to a style sheet and can then automatically represent RDF URIRefs as their truncated form.

Bring in ipyelk as a dependency

We will need ipyelk for visualizations.

Once jupyterlab releases version 3.0 and ipyelk cuts a new release we should bring it into ipyradiant.

Proper way to process namespace information

This ticket is interested in a common process for handling namespaces.

Currently, namespaces must be defined like so:

initNs = {
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "res": "https://swapi.co/resource/",
    "voc": "https://swapi.co/vocabulary/",
    "base": "https://swapi.co/resource/",
}

We should be able to support the values being URIRef and Namespace as well. The entire initNs should also support being a NamespaceManager.

We need a common way to process the namespace (maybe a util) so that queries and other methods can use the namespace object without worry.

Add screencast to the README

Lets save ourselves some data complexity in the repo and hosting headaches and use screencast for README examples.

The downside is that they will not work offline, but I think that is okay.

I'm including an initial version, which does not include any of the ipycytoscape visuals yet.
1hBRZEmg50

Migrate tests

The current tests are located under examples/tests, which is not great for indexing (e.g. pytest).

This ticket should move them into a standard location and update any links (pytest, doit, etc).

Strip out more metadata, whitespace in notebooks

Let's configure nblint to only allow certain metadata so that review isn't more painful than it needs to be.

There's also still some whitespace issues, not sure why blank/prettier aren't handling them

Generalize Query Building UI

A number of UI could be brought to bear on the direct or indirect construction and visualization of SPARQL queries. As long as the interface is...

Changing QueryBuilder.sparql will eventually update QueryBuilder.results

...pretty much any of the below concepts would work. Most of them would require bespoke labextension development, or at least significantly more dependencies and potential install complexity than what we have now.

text based

data-based

  • SPARQL.js
    • moving from a somewhat quirky standard to a a more broadly-distributed data format
    • if we could take a (large number) of sparql queries and generate a JSON schema, we could then more or less automatically generate/validate it with rjsf (by way of wxyz.JsonSchemaForm) or equivalent tools

visual programming

  • vsb
    • looks solid, but kinda old, might be able to be resurrected
  • draw.io
    • while a rather daunting API, some good strides have been made to do basic embedding of this into Lab. Defining a "proper" visual query language is really hard, but it has a lot of potential, and looks sharp

query-by-example

  • graph-pattern-learner
    • also a bit long in the tooth (may well not work with the most recent rdflib), this would offer the user two tables of gozintas and gozoutas, burn a little coal, and then show the "answer" without much intervention, with the intermediate SPARQL as a byproduct, which could then be visualized

InteractiveViewer Bug Fixes

Placeholder ticket for tracking issues with the InteractiveViewer widget.

  • directed edges may be incorrect when inferred from expanded nodes
  • edges between nodes should be added when the graph is populated initially
  • new nodes should be passed to RDF2NX
  • investigate flashing changes to layout
  • investigate issues with front-end updates not triggering (forcing children to be re-specified or node create/delete).

Integrate initial graph explorer capabilities

With #61 introducing the GraphExploreNodeSelection, and #68 introducing the InteractiveViewer, this ticket scopes in the integration of these capabilities into a unified GraphExplorer widget.

The widget should support the selection of nodes, their population in the viewer, and the successful execution of basic workflows for graph exploration. Selected nodes should have their data displayed via ipywidgets.Output using IPython.display.JSON.

Blocked by #61 #68

Improve testing for the InteractiveViewer class.

At the moment, we use some 'magic numbers' for this class. In the future this needs to be expanded to be less hard-coded and more dynamic in case the Star Wars API itself changes in the future.

Replace drawio submodule with published packages

The npm packages for @deathbeds/jupyterlab-drawio are up:

the good

jupyter labextension install \
  @deathbeds/[email protected] \
  @deathbeds/[email protected]

Installing these directly would replace the use of the enormous, nested submodules, and make things snappier and more predictable, but crucially less complex.

the bad

The pdf is still not really shippable (random npm/browser installs), and would require a pip dependency, so may not be worth it for now. I've also found that some complex shapes (e.g. the cloud icons) don't embed properly.

conda install -yc conda-forge requests_cache pypdf2
pip install [email protected]
jupyter labextension install @deathbeds/[email protected]

the ugly

Notebooks work, but are separated out:

jupyter labextension install @deathbeds/[email protected]

This is mostly because I have a lot of stuff planned there. Also, the notebook metadata tag changed for .dio.ipynb to add the @deathbeds namespace, but is otherwise compatible.

Make `RDF2NX` converter support parallelization

The general process of the RDF2NX converter is designed to support parallelization. This ticket investigates how to improve the RDF2NX class so that functions can be executed in parallel.

This likely requires the internal queries (e.g. NodeTypes) to be pulled out of the higher-level methods (e.g. transform_nodes).

Blocked by #50 and #58

Initial Stats data and widgets

When processing/loading RDF data, it is often informative to have basic stats on the data itself (e.g. # of triples). This ticket scopes in the development of a basic statistics class where stats are driven/exposed by queries and stored in basic tables. Basic stats may include:

stat #
triples 123
subjects 34
predicates 12
objects 45
prefixes 3

Stats would be pluggable, e.g. we may have an OWLStats class that understands how to query for e.g. ObjectProperty Axioms.

Make networkx version pin more accurate, layout loading more lenient

We unconditionally do a hard import of a random selection of networkx layouts that happened to be in networkx 2.4. These evolve over time, such as planar, added in 2.3 and multipartite, added in 2.5.

We should probably:

  • pick a sane bottom pin, e.g. 2?
    • update dependencies in various places, e.g. setup.cfg
    • ensure new deps propagate to conda-forge after release
  • determine a sane default (e.g. what was the first implemented layout?)
    • just pulling the first one, alphabetically
  • in the core widget, don't try to validate against a known set
    • it's not validating, and will best-effort try unknown names (by key)
  • wait to lazily attempt resolving name, fall back to default
  • in the layout picker tool, do discovery of networkx layouts
    • it was simpler to do it in the core, they all get loaded at once anyway

๐ŸŒž Release 0.1.2 J_e ๐Ÿ“ฆ

Incremental improvements over 0.1.1

Blockers:

  • #50
  • #55
  • #61
  • #68
  • #74
  • cleanup PR
  • Binder
    • Cytoscape not showing in Binder?
  • Docs
    • Review changelog
    • Review README
      • Update screencasts in README?
  • Release
    • tag, upload CI-built assets with checksum
    • pypi
  • Post-mortem
    • bump version to 0.1.3
    • conda-forge
    • Update release procedure

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.