jupyrdf / ipyradiant Goto Github PK
View Code? Open in Web Editor NEWJupyter widgets for working with RDF graphs.
License: BSD 3-Clause "New" or "Revised" License
Jupyter widgets for working with RDF graphs.
License: BSD 3-Clause "New" or "Revised" License
The current implementation of the SPARQLQueryFramer
class requires the user to implement a sparql
string attribute, which is used to perform the query.
This ticket scopes in the addition of a way to point the framer to a sparql file, instead of providing a sparql string.
cytoscape doesn't appear to highlight what nodes are actually selected. this can probably be fixed with a selector in their style language. Ideally, we would hoist these to the Base
, as e.g. selected_node_color
.
Further, both support multiple selection modes: box select is probably the best default, as graphs are big, unless it messes with the pan/zoom interaction. This could also be hoisted as a box_select
, or seelction_mode
, etc.
As of right now in the code implementing within the cytoscape visualization the code is checking to make sure the object of the triple is not a literal before adding it to the group of edges, do we want this behavior? or do we want triples which have literals as their objects also?
It's worth a look to see which (if any) of the upstreams haven't been released to support lab3. Going to 3 would remove node/npm entirely from this repo for everything except prettier
, which we can vendor, and would improve both development time and potentially CI.
extension | status | link |
---|---|---|
@jupyter-widgets/jupyterlab-manager | ready | https://pypi.org/project/jupyterlab-widgets/1.0.0/ |
@pyviz/jupyterlab_pyviz | ready | https://pypi.org/project/pyviz-comms/2.0.1/ |
jupyter-cytoscape | ready | https://pypi.org/project/ipycytoscape/1.2.0/ |
qgrid2 | dead | --- |
extension | status | link |
---|---|---|
@deathbeds/[email protected] | not started | deathbeds/ipydrawio#11 |
@deathbeds/[email protected] | not started | deathbeds/ipydrawio#11 |
@deathbeds/[email protected] | not started | deathbeds/ipydrawio#11 |
As of right now we can hover over nodes and edges in datashader but are unable to send selected node/edge information to the backend like we are with ipycytoscape. Looking for a way that we can click on the datashader objects and get the callbacks to the backend so we can do something with the information.
Adding data to the Cytoscape widget graph object is an additive process (it does not clear the previous graph). This may be something we want in a future visualization, but the current desired effect is to replace the data. A method for this replacement implementation exists, but a widget flag to capture the additive behavior as an option may be worthwhile.
Currently the remote query examples access URLs such as DBPedia.org. This can impact testing when services are unavailable.
Ticket scopes in the prevention of this code from running during testing/CI. Maybe an env variable?
To help inform what our individual components should look like, let's use an open source tool to show some ideas.
rdflib has a number of parsers see this link but not all are installed in rdflib by default. The guess_format
for graph loading allows all the parsers to be references even if they are not installed.
This ticket scopes in a set of basic tests to ensure that fixtures exist for each format, and that the parsers are checked to ensure we know exactly what file formats our load widget should except.
Any missing fixture formats should also be added (e.g. n3).
Examples folder is getting a bit cluttered.
Lets reorg to separate the tests from the examples, create a "tests" sandbox folder.
Also, remove any examples we don't need anymore.
It would be interesting to experiment with hiveplots. These have the advantage over (even deterministic) hairball diagrams in that they have a stable shape, and can be compared visually.
as a new user of an ontology, i'd like to understand the class hierarchy and use of predicates
For example, with the schema.org examples, one might want a:
with:
subclassOf
There is hive_networkx, and presumably one could reuse everything except the matplotlib and drive it off datashader (maybe a pain to get the axis lines in
This would extend the existing stuff, and provide a query/graph/etc to determine the axes to which each thing belongs.
Ticket to improve the cytoscape visualization to go along with the new RDF2NX converter. Things to be included:
Continuation of #42.
Previous efforts have identified a basic process for collapsing a set of predicates onto their subject nodes in order to transform an RDF graph into an LPG graph. This ticket expands upon that process and attempts to identify relevant patterns for higher-order collapsing (e.g. more predicates).
Scope (completion conditions):
The preliminary QueryWidget
was satisfactory for early release, but should be improved to support common adoption.
This ticket scopes in:
ipywidgets.Output
for the query results (QueryWidget.grid
).We removed the auto/flex width in favor of something with a more predictable behavior.
The Textarea height adjustment (in the UI) cannot be disabled, but clicking it breaks our resize capability. This is unavoidable until the css is available to disable it in ipywidgets. See this issue for tracking.
We decided that the graph and query text namespaces should be kept separate. Users will note that they can use namespaces that are already bound within the widget graph, or they can define the prefixes in the query body. A future update may include an additional namespace argument to aid in automation, but for now these two methods are deemed sufficient.
Closed by #102
Looks like scipy vs windows vs conda-forge is not happy. We probably need to:
ipyradiant-with-cytoscape
ipyradiant-with-datashader
import ipyradiant.vis.cytoscape, ipyradiant.vis.datashader
0.2.0
Why does overwriting the sparql
attribute of a query class cause the query to fail if a PREFIX
isn't specified? The namespace may be in initNs
or graph.namespaces
, but the process doesn't recognize them.
Example:
# this doesn't work without PREFIX
RDF2NX.node_properties.sparql = """
PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?iri ?predicate ?value
{
?iri ?predicate ?value.
VALUES (?predicate){
(rdfs:label)
(voc:starship)
}
}
"""
It will likely be useful to abstract out references to rdflib.Graph
which appear throughout and provide a more-or-less abstract GraphStore
widget. The fallback implementation would probably still be an in-memory RdflibGraphStore
(probably backed by ConjunctiveGraph
) but perhaps to determine what is needed, a VirtuosoGraphStore
virtuoso-opensource would be a reasonable first step.
It's also probably worth thinking about heavy operations like query and load as an async
operation... i'm pretty sure rdflib.Graph
isn't threadsafe, so it would have to use a ThreadPoolExecutor(1)
with some kind of debouncing/throttling. Virtuoso on the other hand, can handle many simultaneous queries, so either wrapping it or finding an async client would work.
Since these might carry multiple flavors of potentially tightly-pinned dependencies, we'd either want them to be individual packages, or add an extras
section to setup.py
.
VALUES
is a common pattern for specifying multiple bindings (values) to a single variable. The SPARQLQueryFramer class works off of a sparql
string attribute, which makes it challenging to define VALUES
in a pythonic way.
This ticket scopes in the inclusion of a metaclass capability to capture a dictionary of values that are converted into a formatted sparql query at runtime. Due to its relative complexity, this ticket should also include sufficient documentation and examples of how to use the metaclasses to perform queries with VALUES
.
Blockers:
Follow-on:
A linked multi select widget that can be used to down-select predicates that a user wishes to collapse.
Long-running issue to capture ipyradiant screenshots/screencasts.
Develop an example and tests for the RDF2NX custom processing.
We are interested in using networkx to perform complex graph algorithms, and already use it to build visualizations. As RDF represents all data as nodes, we will want to collapse down DataProperties onto the nodes that are created in the networkx graph.
This is the first ticket towards that capability that investigates the SPARQL queries needed to perform the collapsing process.
Monolithic queries are difficult to define, challenging to maintain, and in some cases less performant than multiple separate queries.
This ticket investigates a way to allow for N
number of queries of each type for the ipyradiant.rdf2nx.RDF2NX
converter. This would allow users to specify multiple queries that satisfy a stage of the process (e.g. NodeIRIs
). The process should use nx.compose
to aggregate results and pass them through the normal RDF2NX
process.
We want to be able to collapse the representation of a URIRef into a truncated form based on either a dictionary passed by a user or also a namespacermanager object. Need to develop examples for both as well. Blocked by PR#46
Need a basic example ttl file that we can use in the notebook examples and future tests. This can be replaced with a better process/example later, but we need something in the short-term.
Need a more comprehensive set of unit tests for the SPARQLQueryFramer class.
This should include a set of tests for basic metaclass designs that work with the framer.
When looking to truncate the display of a URIRef but still gain access to the data associated with said term... need to investigate jsonld as an implementation of this feature.
Also, investigate use of CSS to truncate the URIRef names. For example, class RDFCSSLabeler(HTML) that is attached to a style sheet and can then automatically represent RDF URIRefs as their truncated form.
We will need ipyelk for visualizations.
Once jupyterlab releases version 3.0 and ipyelk cuts a new release we should bring it into ipyradiant.
This ticket is interested in a common process for handling namespaces.
Currently, namespaces must be defined like so:
initNs = {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"res": "https://swapi.co/resource/",
"voc": "https://swapi.co/vocabulary/",
"base": "https://swapi.co/resource/",
}
We should be able to support the values being URIRef
and Namespace
as well. The entire initNs should also support being a NamespaceManager
.
We need a common way to process the namespace (maybe a util) so that queries and other methods can use the namespace object without worry.
The starwars data source API https://swapi.co/ is deprecated and needs to be updated to https://swapi.dev.
The API is also invoked differently:
For a resource like this: https://swapi.co/resource/human/11
Now has to be called like this: https://swapi.dev/api/people/11
This is referencing Anakin Skywalker
The file is here: https://github.com/jupyrdf/ipyradiant/blob/master/examples/data/starwars.ttl
The current tool does not capture error information to logging in a consistent fashion (via Output).
The current tests are located under examples/tests
, which is not great for indexing (e.g. pytest).
This ticket should move them into a standard location and update any links (pytest, doit, etc).
Let's configure nblint to only allow certain metadata so that review isn't more painful than it needs to be.
There's also still some whitespace issues, not sure why blank/prettier aren't handling them
A number of UI could be brought to bear on the direct or indirect construction and visualization of SPARQL queries. As long as the interface is...
Changing
QueryBuilder.sparql
will eventually updateQueryBuilder.results
...pretty much any of the below concepts would work. Most of them would require bespoke labextension development, or at least significantly more dependencies and potential install complexity than what we have now.
Editor
from wxyz
Editor
and Notebook
, and the API is not (partially intentionally) not particularly amenable to extension in this wayPlaceholder ticket for tracking issues with the InteractiveViewer widget.
rdflib includes a package for querying remote SPARQL endpoints. This would be a nice capability for the tool instead of loading a file, or passing in a graph object.
With #61 introducing the GraphExploreNodeSelection
, and #68 introducing the InteractiveViewer
, this ticket scopes in the integration of these capabilities into a unified GraphExplorer
widget.
The widget should support the selection of nodes, their population in the viewer, and the successful execution of basic workflows for graph exploration. Selected nodes should have their data displayed via ipywidgets.Output
using IPython.display.JSON
.
In the InteractiveViewer
class, we currently are manually resetting children so that the changes propogate to front-end. It would be nice to have this done automatically via traitlets.
For now, will use normal notebook hyperlinks to develop an index landing page that categorizes examples.
In the future, may want to move to something more like Jupyter{book}.
At the moment, we use some 'magic numbers' for this class. In the future this needs to be expanded to be less hard-coded and more dynamic in case the Star Wars API itself changes in the future.
The npm
packages for @deathbeds/jupyterlab-drawio
are up:
jupyter labextension install \
@deathbeds/[email protected] \
@deathbeds/[email protected]
Installing these directly would replace the use of the enormous, nested submodules, and make things snappier and more predictable, but crucially less complex.
The pdf is still not really shippable (random npm/browser installs), and would require a pip
dependency, so may not be worth it for now. I've also found that some complex shapes (e.g. the cloud icons) don't embed properly.
conda install -yc conda-forge requests_cache pypdf2
pip install [email protected]
jupyter labextension install @deathbeds/[email protected]
Notebooks work, but are separated out:
jupyter labextension install @deathbeds/[email protected]
This is mostly because I have a lot of stuff planned there. Also, the notebook metadata tag changed for .dio.ipynb
to add the @deathbeds
namespace, but is otherwise compatible.
The general process of the RDF2NX
converter is designed to support parallelization. This ticket investigates how to improve the RDF2NX
class so that functions can be executed in parallel.
This likely requires the internal queries (e.g. NodeTypes
) to be pulled out of the higher-level methods (e.g. transform_nodes
).
When processing/loading RDF data, it is often informative to have basic stats on the data itself (e.g. # of triples). This ticket scopes in the development of a basic statistics class where stats are driven/exposed by queries and stored in basic tables. Basic stats may include:
stat | # |
---|---|
triples | 123 |
subjects | 34 |
predicates | 12 |
objects | 45 |
prefixes | 3 |
Stats would be pluggable, e.g. we may have an OWLStats
class that understands how to query for e.g. ObjectProperty Axioms.
We unconditionally do a hard import of a random selection of networkx layouts that happened to be in networkx 2.4. These evolve over time, such as planar
, added in 2.3 and multipartite
, added in 2.5.
We should probably:
2
?
setup.cfg
conda-forge
after releaseIncremental improvements over 0.1.1
Blockers:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.