GithubHelp home page GithubHelp logo

rdflib / pyshacl Goto Github PK

View Code? Open in Web Editor NEW
236.0 25.0 62.0 3.03 MB

A Python validator for SHACL

License: Apache License 2.0

Python 98.87% JavaScript 0.32% Makefile 0.34% Shell 0.31% Dockerfile 0.16%
shacl validator owl rdf constraints

pyshacl's Introduction

pySHACL

A Python validator for SHACL.

Build Status

DOI Downloads Downloads Downloads

This is a pure Python module which allows for the validation of RDF graphs against Shapes Constraint Language (SHACL) graphs. This module uses the rdflib Python library for working with RDF and is dependent on the OWL-RL Python module for OWL2 RL Profile based expansion of data graphs.

This module is developed to adhere to the SHACL Recommendation:

Holger Knublauch; Dimitris Kontokostas. Shapes Constraint Language (SHACL). 20 July 2017. W3C Recommendation. URL: https://www.w3.org/TR/shacl/ ED: https://w3c.github.io/data-shapes/shacl/

Community for Help and Support

The SHACL community has a discord server for discussion of topics around SHACL and the SHACL specification.

Use this invitation link: https://discord.gg/RTbGfJqdKB to join the server

There is a #pyshacl channel for discussion of this python library, and you can ask for general SHACL help too.

Installation

Install with PIP (Using the Python3 pip installer pip3)

$ pip3 install pyshacl

Or in a python virtualenv (these example commandline instructions are for a Linux/Unix based OS)

$ python3 -m virtualenv --python=python3 --no-site-packages .venv
$ source ./.venv/bin/activate
$ pip3 install pyshacl

To exit the virtual enviornment:

$ deactivate

Command Line Use

For command line use: (these example commandline instructions are for a Linux/Unix based OS)

$ pyshacl -s /path/to/shapesGraph.ttl -m -i rdfs -a -j -f human /path/to/dataGraph.ttl

Where

  • -s is an (optional) path to the shapes graph to use
  • -e is an (optional) path to an extra ontology graph to import
  • -i is the pre-inferencing option
  • -f is the ValidationReport output format (human = human-readable validation report)
  • -m enable the meta-shacl feature
  • -a enable SHACL Advanced Features
  • -j enable SHACL-JS Features (if pyhsacl[js] is installed)

System exit codes are: 0 = DataGraph is Conformant 1 = DataGraph is Non-Conformant 2 = The validator encountered a RuntimeError (check stderr output for details) 3 = Not-Implemented; The validator encountered a SHACL feature that is not yet implemented.

Full CLI Usage options:

$ pyshacl -h
$ python3 -m pyshacl -h
usage: pyshacl [-h] [-s [SHACL]] [-e [ONT]] [-i {none,rdfs,owlrl,both}] [-m]
               [-im] [-a] [-j] [-it] [--abort] [--allow-info] [-w]
               [--max-depth [MAX_DEPTH]] [-d]
               [-f {human,table,turtle,xml,json-ld,nt,n3}]
               [-df {auto,turtle,xml,json-ld,nt,n3}]
               [-sf {auto,turtle,xml,json-ld,nt,n3}]
               [-ef {auto,turtle,xml,json-ld,nt,n3}] [-V] [-o [OUTPUT]]
               [--server]
               DataGraph

PySHACL 0.26.0 command line tool.

positional arguments:
  DataGraph             The file containing the Target Data Graph.

optional arguments:
  --server              Ignore all the rest of the options, start the HTTP Server.
  -h, --help            show this help message and exit
  -s [SHACL], --shacl [SHACL]
                        A file containing the SHACL Shapes Graph.
  -e [ONT], --ont-graph [ONT]
                        A file path or URL to a document containing extra
                        ontological information. RDFS and OWL definitions from this 
                        are used to inoculate the DataGraph.
  -i {none,rdfs,owlrl,both}, --inference {none,rdfs,owlrl,both}
                        Choose a type of inferencing to run against the Data
                        Graph before validating.
  -m, --metashacl       Validate the SHACL Shapes graph against the shacl-
                        shacl Shapes Graph before validating the Data Graph.
  -im, --imports        Allow import of sub-graphs defined in statements with
                        owl:imports.
  -a, --advanced        Enable features from the SHACL Advanced Features
                        specification.
  -j, --js              Enable features from the SHACL-JS Specification.
  -it, --iterate-rules  Run Shape's SHACL Rules iteratively until the
                        data_graph reaches a steady state.
  --abort               Abort on first invalid data.
  --allow-info, --allow-infos
                        Shapes marked with severity of Info will not cause
                        result to be invalid.
  -w, --allow-warning, --allow-warnings
                        Shapes marked with severity of Warning or Info will
                        not cause result to be invalid.
  --max-depth [MAX_DEPTH]
                        The maximum number of SHACL shapes "deep" that the
                        validator can go before reaching an "endpoint"
                        constraint.
  -d, --debug           Output additional runtime messages.
  -f {human,table,turtle,xml,json-ld,nt,n3}, --format {human,table,turtle,xml,json-ld,nt,n3}
                        Choose an output format. Default is "human".
  -df {auto,turtle,xml,json-ld,nt,n3}, --data-file-format {auto,turtle,xml,json-ld,nt,n3}
                        Explicitly state the RDF File format of the input
                        DataGraph file. Default="auto".
  -sf {auto,turtle,xml,json-ld,nt,n3}, --shacl-file-format {auto,turtle,xml,json-ld,nt,n3}
                        Explicitly state the RDF File format of the input
                        SHACL file. Default="auto".
  -ef {auto,turtle,xml,json-ld,nt,n3}, --ont-file-format {auto,turtle,xml,json-ld,nt,n3}
                        Explicitly state the RDF File format of the extra
                        ontology file. Default="auto".
  -V, --version         Show PySHACL version and exit.
  -o [OUTPUT], --output [OUTPUT]
                        Send output to a file (defaults to stdout).
  --server              Ignore all the rest of the options, start the HTTP
                        Server. Same as `pyshacl_server`.

Python Module Use

For basic use of this module, you can just call the validate function of the pyshacl module like this:

from pyshacl import validate
r = validate(data_graph,
      shacl_graph=sg,
      ont_graph=og,
      inference='rdfs',
      abort_on_first=False,
      allow_infos=False,
      allow_warnings=False,
      meta_shacl=False,
      advanced=False,
      js=False,
      debug=False)
conforms, results_graph, results_text = r

Where:

  • data_graph is an rdflib Graph object or file path of the graph to be validated
  • shacl_graph is an rdflib Graph object or file path or Web URL of the graph containing the SHACL shapes to validate with, or None if the SHACL shapes are included in the data_graph.
  • ont_graph is an rdflib Graph object or file path or Web URL a graph containing extra ontological information, or None if not required. RDFS and OWL definitions from this are used to inoculate the DataGraph.
  • inference is a Python string value to indicate whether or not to perform OWL inferencing expansion of the data_graph before validation. Options are 'rdfs', 'owlrl', 'both', or 'none'. The default is 'none'.
  • abort_on_first (optional) bool value to indicate whether or not the program should abort after encountering the first validation failure or to continue. Default is to continue.
  • allow_infos (optional) bool value, Shapes marked with severity of Info will not cause result to be invalid.
  • allow_warnings (optional) bool value, Shapes marked with severity of Warning or Info will not cause result to be invalid.
  • meta_shacl (optional) bool value to indicate whether or not the program should enable the Meta-SHACL feature. Default is False.
  • advanced: (optional) bool value to enable SHACL Advanced Features
  • js: (optional) bool value to enable SHACL-JS Features (if pyshacl[js] is installed)
  • debug (optional) bool value to indicate whether or not the program should emit debugging output text, including violations that didn't lead to non-conformance overall. So when debug is True don't judge conformance by absense of violation messages. Default is False.

Some other optional keyword variables available on the validate function:

  • data_graph_format: Override the format detection for the given data graph source file.
  • shacl_graph_format: Override the format detection for the given shacl graph source file.
  • ont_graph_format: Override the format detection for the given extra ontology graph source file.
  • iterate_rules: Iterate SHACL Rules until steady state is found (only works with advanced mode).
  • do_owl_imports: Enable the feature to allow the import of subgraphs using owl:imports for the shapes graph and the ontology graph. Note, you explicitly cannot use this on the target data graph.
  • serialize_report_graph: Convert the report results_graph into a serialised representation (for example, 'turtle')
  • check_dash_result: Check the validation result against the given expected DASH test suite result.

Return value:

  • a three-component tuple containing:
    • conforms: a bool, indicating whether or not the data_graph conforms to the shacl_graph
    • results_graph: a Graph object built according to the SHACL specification's Validation Report structure
    • results_text: python string representing a verbose textual representation of the Validation Report

Python Module Call

You can get an equivalent of the Command Line Tool using the Python3 executable by doing:

$ python3 -m pyshacl

Integrated OpenAPI-3.0-compatible HTTP REST Service

PySHACL now has a built-in validation service, exposed via an OpenAPI3.0-compatible REST API.

Due to the additional dependencies required to run, this feature is an optional extra.

You must first install PySHACL with the http extra option enabled:

$ pip3 install -U pyshacl[http]

When that is installed, you can start the service using the by executing the CLI entrypoint:

$ pyshacl --server
# or
$ pyshacl_server
# or
$ python3 -m pyshacl server
# or
$ docker run --rm -e PYSHACL_SERVER=TRUE -i -t docker.io/ashleysommer/pyshacl:latest

By default, this will run the service on localhost address 127.0.0.1 on port 8099.

To view the SwaggerUI documentation for the service, navigate to http://127.0.0.1:8099/docs/swagger and for the ReDoc version, go to http://127.0.0.1:8099/docs/redoc.

To view the OpenAPI3 schema see http://127.0.0.1:8099/docs/openapi.json

Configuring the HTTP REST Service

  • You can force PySHACL CLI to start up in HTTP Server mode by passing environment variable PYSHACL_SERVER=TRUE. This is useful in a containerised service, where you will only be running PySHACL in this mode.
  • PYSHACL_SERVER_LISTEN=1.2.3.4 listen on a different IP Address or hostname
  • PYSHACL_SERVER_PORT=8080 listen on given different TCP PORT
  • PYSHACL_SERVER_HOSTNAME=example.org when you are hosting the server behind a reverse-proxy or in a containerised environment, use this so PySHACL server knows what your externally facing hostname is

Errors

Under certain circumstances pySHACL can produce a Validation Failure. This is a formal error defined by the SHACL specification and is required to be produced as a result of specific conditions within the SHACL graph. If the validator produces a Validation Failure, the results_graph variable returned by the validate() function will be an instance of ValidationFailure. See the message attribute on that instance to get more information about the validation failure.

Other errors the validator can generate:

  • ShapeLoadError: This error is thrown when a SHACL Shape in the SHACL graph is in an invalid state and cannot be loaded into the validation engine.
  • ConstraintLoadError: This error is thrown when a SHACL Constraint Component is in an invalid state and cannot be loaded into the validation engine.
  • ReportableRuntimeError: An error occurred for a different reason, and the reason should be communicated back to the user of the validator.
  • RuntimeError: The validator encountered a situation that caused it to throw an error, but the reason does concern the user.

Unlike ValidationFailure, these errors are not passed back as a result by the validate() function, but thrown as exceptions by the validation engine and must be caught in a try ... except block. In the case of ShapeLoadError and ConstraintLoadError, see the str() string representation of the exception instance for the error message along with a link to the relevant section in the SHACL spec document.

Windows CLI

Pyinstaller can be used to create an executable for Windows that has the same characteristics as the Linux/Mac CLI program. The necessary .spec file is already included in pyshacl/pyshacl-cli.spec. The pyshacl-cli.spec PyInstaller spec file creates a .exe for the pySHACL Command Line utility. See above for the pySHACL command line util usage instructions.

See the PyInstaller installation guide for info on how to install PyInstaller for Windows.

Once you have pyinstaller, use pyinstaller to generate the pyshacl.exe CLI file like so:

$ cd src/pyshacl
$ pyinstaller pyshacl-cli.spec

This will output pyshacl.exe in the dist directory in src/pyshacl.

You can now run the pySHACL Command Line utility via pyshacl.exe. See above for the pySHACL command line util usage instructions.

Docker

Pull the official docker image from Dockerhub: docker pull docker.io/ashleysommer/pyshacl:latest

Or build the image yourself, from the PySHACL repository with docker build . -t pyshacl.

You can now run PySHACL inside a container; but you need to mount the data you want to validate. For example, to validate graph.ttl against shacl.ttl, run :

docker run --rm -i -t --mount type=bind,src=`pwd`,dst=/data pyshacl -s /data/shacl.ttl /data/graph.ttl

Compatibility

PySHACL is a Python3 library. For best compatibility use Python v3.8 or greater. Python3 v3.7 or below is not supported and this library does not work on Python v2.7.x or below.

PySHACL is a PEP518 & PEP517 project, it uses pyproject.toml and poetry to manage dependencies, build and install.

For best compatibility when installing from PyPI with pip, upgrade to pip v20.0.2 or above.

  • If you're on Ubuntu 18.04 or older, you will need to run sudo pip3 install --upgrade pip to get the newer version.

Features

A features matrix is kept in the FEATURES file.

Changelog

A comprehensive changelog is kept in the CHANGELOG file.

Benchmarks

This project includes a script to measure the difference in performance of validating the same source graph that has been inferenced using each of the four different inferencing options. Run it on your computer to see how fast the validator operates for you.

License

This repository is licensed under Apache License, Version 2.0. See the LICENSE deed for details.

Contributors

See the CONTRIBUTORS file.

Citation

DOI: 10.5281/zenodo.4750840 (For all versions/latest version)

Contacts

Project Lead: Nicholas Car Senior Experimental Scientist CSIRO Land & Water, Environmental Informatics Group Brisbane, Qld, Australia [email protected] http://orcid.org/0000-0002-8742-7730

Lead Developer: Ashley Sommer Informatics Software Engineer CSIRO Land & Water, Environmental Informatics Group Brisbane, Qld, Australia [email protected] https://orcid.org/0000-0003-0590-0131

pyshacl's People

Contributors

ajnelson-nist avatar ashleysommer avatar aucampia avatar bollwyvl avatar dependabot[bot] avatar gtfierro avatar jameshowison avatar jamiefeiss avatar johannesloetzsch avatar jyucsiro avatar konradhoeffner avatar martijn-y-ai avatar mfsy avatar mgberg avatar mpolitze avatar nicholascar avatar nicholsn avatar panaetius avatar piyush69 avatar rinkehoekstra avatar tcmitchell avatar wcrd avatar westurner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyshacl's Issues

RDFClosure dependency in the code breaks at least pyshacl CLI completely

It seems like RDFClosure has been renamed to OWL-RL and that RDFClosure has been removed from the online repositories. Thus it's not installed as a dependency for pyshacl and thus pyshacl throws a lot of ModuleNotFoundErrors. E.g.:

### pyshacl --help
Traceback (most recent call last):
  File "/usr/local/bin/pyshacl", line 17, in <module>
    from pyshacl import validate
  File "/usr/local/lib/python3.7/site-packages/pyshacl/__init__.py", line 3, in <module>
    from pyshacl.validate import validate
  File "/usr/local/lib/python3.7/site-packages/pyshacl/validate.py", line 12, in <module>
    from pyshacl.inference import CustomRDFSSemantics, CustomRDFSOWLRLSemantics
  File "/usr/local/lib/python3.7/site-packages/pyshacl/inference/__init__.py", line 2, in <module>
    from .custom_rdfs_closure import CustomRDFSSemantics, CustomRDFSOWLRLSemantics
  File "/usr/local/lib/python3.7/site-packages/pyshacl/inference/custom_rdfs_closure.py", line 2, in <module>
    from RDFClosure.RDFSClosure import RDFS_Semantics as OrigRDFSSemantics
ModuleNotFoundError: No module named 'RDFClosure'

I'm using version 0.9.7 of pyshacl from pypi and the issue happens for each command of pyshacl, I have tested so far.

SPARQL targets giving wrong inference

When I try to do SPARQL based SHACL validation, I am getting the wrong results.I am trying to filter out processes Testsparql:Process where Testsparql:Cranecapacity is less than Testsparql:Moduleweight. However I am getting the desired output when my datafile and shape file is in a single RDF. However when I split it into 2 RDF, I am not getting the correct inference.

2 file case:

from pyshacl import validate
shapes_file = '''
@prefix Testsparql: <http://semanticprocess.x10host.com/Ontology/Testsparql#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

Testsparql:PrefixDeclaration
  rdf:type sh:PrefixDeclaration ;
  sh:namespace "http://semanticprocess.x10host.com/Ontology/Testsparql#"^^xsd:anyURI ;
  sh:prefix "Testsparql" ;
.

Testsparql:Processshape
  rdf:type rdfs:Class ;
  rdf:type sh:NodeShape ;
  rdfs:subClassOf owl:Class ;
  sh:sparql [
      sh:message "Invalid process" ;
      sh:prefixes <http://semanticprocess.x10host.com/Ontology/Testsparql> ;
      sh:select """SELECT $this 
        WHERE {
			 $this  rdf:type Testsparql:Process.
			$this Testsparql:hasResource ?crane.
			$this Testsparql:hasAssociation ?module.
			?crane Testsparql:Cranecapacity ?cc.
			?module Testsparql:Moduleweight ?mw.
					FILTER (?cc <= ?mw).

     }""" ;
    ] ;
.

'''
shapes_file_format = 'turtle'

data_file = '''
@prefix Testsparql: <http://semanticprocess.x10host.com/Ontology/Testsparql#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://semanticprocess.x10host.com/Ontology/Testsparql>
  rdf:type owl:Ontology ;
  owl:imports <http://datashapes.org/dash> ;
  owl:versionInfo "Created with TopBraid Composer" ;
  sh:declare Testsparql:PrefixDeclaration ;
.
Testsparql:Crane
  rdf:type rdfs:Class ;
  rdfs:subClassOf owl:Class ;
.
Testsparql:Crane_1
  rdf:type Testsparql:Crane ;
  Testsparql:Cranecapacity "500"^^xsd:decimal ;
.
Testsparql:Crane_2
  rdf:type Testsparql:Crane ;
  Testsparql:Cranecapacity "5000"^^xsd:decimal ;
.
Testsparql:Cranecapacity
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Crane ;
  rdfs:range xsd:decimal ;
  rdfs:subPropertyOf owl:topDataProperty ;
.
Testsparql:Module
  rdf:type rdfs:Class ;
  rdfs:subClassOf owl:Class ;
.
Testsparql:Module_1
  rdf:type Testsparql:Module ;
  Testsparql:Moduleweight "800"^^xsd:decimal ;
.
Testsparql:Moduleweight
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Module ;
  rdfs:range xsd:decimal ;
  rdfs:subPropertyOf owl:topDataProperty ;

.
Testsparql:Process
  rdf:type rdfs:Class ;
  
  rdfs:subClassOf owl:Class ;
  .
Testsparql:ProcessID
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range xsd:string ;
  rdfs:subPropertyOf owl:topDataProperty ;
.
Testsparql:Process_1
  rdf:type Testsparql:Process ;
  Testsparql:ProcessID "P1" ;
  Testsparql:hasAssociation Testsparql:Module_1 ;
  Testsparql:hasResource Testsparql:Crane_1 ;
.
Testsparql:Process_2
  rdf:type Testsparql:Process ;
  Testsparql:ProcessID "P2" ;
  Testsparql:hasAssociation Testsparql:Module_1 ;
  Testsparql:hasResource Testsparql:Crane_2 ;
.
Testsparql:hasAssociation
  rdf:type owl:ObjectProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range Testsparql:Module ;
  rdfs:subPropertyOf owl:topObjectProperty ;
.
Testsparql:hasResource
  rdf:type owl:ObjectProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range Testsparql:Crane ;
  rdfs:subPropertyOf owl:topObjectProperty ;
.

'''
data_file_format = 'turtle'

conforms, v_graph, v_text = validate(data_file, shacl_graph=shapes_file,
                                     target_graph_format=data_file_format,
                                     shacl_graph_format=shapes_file_format,
                                     inference='rdfs', debug=True,
                                     serialize_report_graph=True)
print(conforms)
print(v_graph)
print(v_text)

Result is :

True
b'@prefix Testsparql: <http://semanticprocess.x10host.com/Ontology/Testsparql#> .\n@prefix owl: <http://www.w3.org/2002/07/owl#> .\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n@prefix sh: <http://www.w3.org/ns/shacl#> .\n@prefix xml: <http://www.w3.org/XML/1998/namespace> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n[] a sh:ValidationReport ;\n    sh:conforms true .\n\n'
Validation Report
Conforms: True 

However, if the same data is given in a single file

from pyshacl import validate
data_file = '''
# baseURI: http://semanticprocess.x10host.com/Ontology/Testsparql
# imports: http://datashapes.org/dash
# prefix: Testsparql

@prefix Testsparql: <http://semanticprocess.x10host.com/Ontology/Testsparql#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://semanticprocess.x10host.com/Ontology/Testsparql>
  rdf:type owl:Ontology ;
  owl:imports <http://datashapes.org/dash> ;
  owl:versionInfo "Created with TopBraid Composer" ;
  sh:declare Testsparql:PrefixDeclaration ;
.
Testsparql:Crane
  rdf:type rdfs:Class ;
  rdfs:subClassOf owl:Class ;
.
Testsparql:Crane_1
  rdf:type Testsparql:Crane ;
  Testsparql:Cranecapacity "500"^^xsd:decimal ;
.
Testsparql:Crane_2
  rdf:type Testsparql:Crane ;
  Testsparql:Cranecapacity "5000"^^xsd:decimal ;
.
Testsparql:Cranecapacity
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Crane ;
  rdfs:range xsd:decimal ;
  rdfs:subPropertyOf owl:topDataProperty ;
.
Testsparql:Module
  rdf:type rdfs:Class ;
  rdfs:subClassOf owl:Class ;
.
Testsparql:Module_1
  rdf:type Testsparql:Module ;
  Testsparql:Moduleweight "800"^^xsd:decimal ;
.
Testsparql:Moduleweight
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Module ;
  rdfs:range xsd:decimal ;
  rdfs:subPropertyOf owl:topDataProperty ;
.
Testsparql:PrefixDeclaration
  rdf:type sh:PrefixDeclaration ;
  sh:namespace "http://semanticprocess.x10host.com/Ontology/Testsparql#"^^xsd:anyURI ;
  sh:prefix "Testsparql" ;
.
Testsparql:Process
  rdf:type rdfs:Class ;
  rdf:type sh:NodeShape ;
  rdfs:subClassOf owl:Class ;
  sh:sparql [
      sh:message "Invalid process" ;
      sh:prefixes <http://semanticprocess.x10host.com/Ontology/Testsparql> ;
      sh:select """SELECT $this 
        WHERE {
			 $this  rdf:type Testsparql:Process.
			$this Testsparql:hasResource ?crane.
			$this Testsparql:hasAssociation ?module.
			?crane Testsparql:Cranecapacity ?cc.
			?module Testsparql:Moduleweight ?mw.
					FILTER (?cc <= ?mw).

     }""" ;
    ] ;
.
Testsparql:ProcessID
  rdf:type owl:DatatypeProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range xsd:string ;
  rdfs:subPropertyOf owl:topDataProperty ;
.
Testsparql:Process_1
  rdf:type Testsparql:Process ;
  Testsparql:ProcessID "P1" ;
  Testsparql:hasAssociation Testsparql:Module_1 ;
  Testsparql:hasResource Testsparql:Crane_1 ;
.
Testsparql:Process_2
  rdf:type Testsparql:Process ;
  Testsparql:ProcessID "P2" ;
  Testsparql:hasAssociation Testsparql:Module_1 ;
  Testsparql:hasResource Testsparql:Crane_2 ;
.
Testsparql:hasAssociation
  rdf:type owl:ObjectProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range Testsparql:Module ;
  rdfs:subPropertyOf owl:topObjectProperty ;
.
Testsparql:hasResource
  rdf:type owl:ObjectProperty ;
  rdfs:domain Testsparql:Process ;
  rdfs:range Testsparql:Crane ;
  rdfs:subPropertyOf owl:topObjectProperty ;
.
'''
data_file_format = 'turtle'

conforms, v_graph, v_text = validate(data_file, shacl_graph=None,
                                     target_graph_format=data_file_format,
                                     shacl_graph_format=shapes_file_format,
                                     inference='rdfs', debug=True,
                                     serialize_report_graph=True)
print(conforms)
print(v_graph)
print(v_text)

It gives the correct inference.

False
b'@prefix Testsparql: <http://semanticprocess.x10host.com/Ontology/Testsparql#> .\n@prefix owl: <http://www.w3.org/2002/07/owl#> .\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n@prefix sh: <http://www.w3.org/ns/shacl#> .\n@prefix xml: <http://www.w3.org/XML/1998/namespace> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n[] a sh:ValidationReport ;\n    sh:conforms false ;\n    sh:result [ a sh:ValidationResult ;\n            sh:focusNode Testsparql:Process_1 ;\n            sh:resultMessage "Invalid process" ;\n            sh:resultSeverity sh:Violation ;\n            sh:sourceConstraint [ sh:message "Invalid process" ;\n                    sh:prefixes <http://semanticprocess.x10host.com/Ontology/Testsparql> ;\n                    sh:select """SELECT $this \n        WHERE {\n\t\t\t $this  rdf:type Testsparql:Process.\n\t\t\t$this Testsparql:hasResource ?crane.\n\t\t\t$this Testsparql:hasAssociation ?module.\n\t\t\t?crane Testsparql:Cranecapacity ?cc.\n\t\t\t?module Testsparql:Moduleweight ?mw.\n\t\t\t\t\tFILTER (?cc <= ?mw).\n\n     }""" ] ;\n            sh:sourceConstraintComponent sh:SPARQLConstraintComponent ;\n            sh:sourceShape Testsparql:Process ;\n            sh:value Testsparql:Process_1 ] .\n\n'
Validation Report
Conforms: False
Results (1):
Constraint Violation in SPARQLConstraintComponent (http://www.w3.org/ns/shacl#SPARQLConstraintComponent):
	Severity: sh:Violation
	Source Shape: Testsparql:Process
	Focus Node: Testsparql:Process_1
	Value Node: Testsparql:Process_1
	Source Constraint: [ sh:message Literal("Invalid process") ; sh:prefixes <http://semanticprocess.x10host.com/Ontology/Testsparql> ; sh:select Literal("SELECT $this 
        WHERE {
			 $this  rdf:type Testsparql:Process.
			$this Testsparql:hasResource ?crane.
			$this Testsparql:hasAssociation ?module.
			?crane Testsparql:Cranecapacity ?cc.
			?module Testsparql:Moduleweight ?mw.
					FILTER (?cc <= ?mw).

     }") ]
	Message: Invalid process

Can you help me understand why this is giving the wrong inference?

Validating files containing multiple named graphs

The load.py script currently loads all passed files into an rdflib Graph object. For JSON-LD files that contain multiple named graphs, this means that the resulting graph object g in the referenced line [1] will be empty, and the validation will succeed without warning.

I know that pySHACL currently does not support TriG or NQuads files, but if you allow for JSON-LD, you should allow for these as well as the big difference is the support for named graphs.

There are three ways around this:

  • Show a warning that files containing named graphs cannot currently be validated (not ideal, as this should be easy to fix).
  • Simple quick fix is to load files into a ConjunctiveGraph. This disregards all named graph information, but makes all triples available for validation. This is the behaviour of the SHACL playground implementation. This is not ideal either, as we would like to validate individual graphs (as per the SHACL spec).
  • Better is to load the file into a Dataset and then iterate over each contained graph for validation purposes. This is a more involved fix that requires the loader to always load into a Dataset, and then the validator should iterate over each graph contained in the dataset.

[1]

if g is None:

SPARQLFunction support

I tried using the SHACL found at http://datashapes.org/schema.ttl to validate some data and received the following error:

NotImplementedError: SHACL Advanced Feature SPARQLFunction is not yet supported.

I'll add my vote to getting this feature implemented.

Validation does not work for classes that are also node shapes

If I run the following code:

shapes = rdf.Graph()
shapes.parse(data="""
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix ex: <http://example.org/ns#> .

    ex:Person
          a owl:Class ;
          a sh:NodeShape ;
          sh:property ex:NameConstraint ;
    .

    ex:NameConstraint
          a sh:PropertyShape ;
          sh:path ex:name ;
          sh:minCount 1 ;
        .
""",format="ttl")

data = rdf.Graph()
data.parse(data="""
    @prefix ex: <http://example.org/ns#> .

    ex:Bob
          a ex:Person ;
    .
""",format="ttl")

r = sh.validate(data_graph=data,shacl_graph=shapes,inference='rdfs')
print(r[2])

no validation errors are reported. In order to force the error to be recognized, I have to explicitly declare ex:Person sh:targetClass ex:Person in the shapes graph which shouldn't be necessary.

This is how TopQuadrant products represent classes and node shapes by default, so it would be great if pyshacl could support this.

Validating using a shacl graph

[Python 3.7.0, rdflib 4.2.2, pyshacl 0.9.8.post1]

I am using a graph as shacl_graph shown below.

conforms, v_graph, v_text = validate(g, shacl_graph=g2,
                                     data_graph_format='turtle',
                                     shacl_graph_format='turtle',
                                     inference='rdfs', debug=True,
                                     serialize_report_graph=True)

Validation Report
Conforms: True

The g2 graph I use is the following:

@prefix hei: <http://hei.org/customer/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

hei:HeiAddressShape a sh:NodeShape ;
    sh:property [ rdfs:comment "Street constraint" ;
            sh:datatype xsd:string ;
            sh:minlength 30 ;
            sh:path hei:Ship_to_street ] ;
    sh:targetClass hei:Hei_customer .

Data validated is:

hei:hei_cust_1281 a hei:Sfg_customer ;
    rdfs:label "XYZHorecagroothandel" ;
    hei:Klant_nummer 1281 ;
    hei:Ship_to_City "Middenmeer" ;
    hei:Ship_to_postcode "1799 AB" ;
    hei:Ship_to_street "Industrieweg" 

The issue is when I pass a graph object no validation is done; passing the g2 validation graph as a string works fine. I did expect both options to work fine.

Does pySHACL support SHACL-JS?

Hello,

I noticed that pySHACL supports SHACL Advanced Features
(SPARQL).

I wonder if you also have to support the SHACL JavaScript Extensions (SHACL-JS)?

Best Regards,

Angelo

owl imports of other shape graphs

So, very new to all this... but have a question.

Is it possible to define an owl:imports in a shape file to pull in previously defined shapes from another file? Ref https://github.com/ESIPFed/science-on-schema.org/blob/master/tools/sospy/shapegraphs/reqrec.ttl

I'm looking at https://book.validatingrdf.com/bookHtml011.html section 5.4 for inspiration.

Note: Maybe I'm being too cute trying to import from a github raw URL?

I get a proper violation from the recomended.ttl file, but I can not import it and use it. I don't know if this is not possible or (more likely) I'm doing it wrong.

Thanks

pyshacl -s ./shapegraphs/recomendShape.ttl  -m  -f human -df json-ld ./datagraphs/dataset-minimal.json-ld
Validation Report
Conforms: False
Results (1):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
        Severity: sh:Violation
        Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <http://schema.org/citation> ]
        Focus Node: [ ]
        Result Path: <http://schema.org/citation>
ย 

pyshacl -s ./shapegraphs/reqrec.ttl  -m  -f human -df json-ld ./datagraphs/dataset-minimal.json-ld
Validation Report
Conforms: True

Windows binary for pySHACL cli

It would be good to wrap pySHACL as a Windows EXE so windows users can execute the CLI without necessarily having to install python

improve error message when using sh:ignoredProperties without sh:closed

This is clearly a low priority, minor issue, but considering the amount of time I spent trying to figure out what was wrong with my rather large SHACL data, I thought it was worth considering and suggesting a change.

A minimal example to illustrate the issue is:

import rdflib
from pyshacl import validate

data = """
@prefix asdf: <http://example.org/asdf/> .
@prefix ex: <http://example.org/> .

asdf:e2e a ex:termA ;
    ex:child asdf:23e .

asdf:23e a ex:termB .
"""

shaclData = """
@prefix ex: <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:termShape a sh:NodeShape ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:targetClass ex:termB .
"""

dataGraph = rdflib.Graph().parse( data = data, format = 'ttl' )
shaclGraph = rdflib.Graph().parse( data = shaclData, format = 'ttl' )
report = validate( dataGraph, shacl_graph = shaclGraph, abort_on_error = False, meta_shacl = False, debug = False, advanced = True, do_owl_imports = True )

This generates what I found to be a confusing error message:

ConstraintLoadError: ClosedConstraintComponent must have at least one sh:closed predicate.
https://www.w3.org/TR/shacl/#ClosedConstraintComponent

The issue is that when using sh:ignoredProperties, sh:closed is expected.

pySHACL is reporting that one is using something related to closed shapes without having a closed shape.

If possible, I would love to see some improvement to the clarity of the error.

error using python module

I am trying to validate a data graph with it corresponding shapes graph. When I use the commandline method it works fine. However I get an erro on using the python module.

I am doing this:

r = validate(data_graph, shacl_graph='./validation/ActivityShape.ttl', ont_graph=None, advanced=True, inference='rdfs', abort_on_error=False)
conforms, results_graph, results_text = r

I get the following error:

Traceback (most recent call last):
  File "validation/test1.py", line 64, in <module>
    r = validate(data_graph=data_graph, shacl_graph='./validation/ActivityShape.ttl', ont_graph=None, advanced=True, inference='rdfs', abort_on_error=False)
  File "/Users/sanuann/validation-env/lib/python3.6/site-packages/pyshacl/validate.py", line 253, in validate
    do_owl_imports=False)  # no imports on data_graph
  File "/Users/sanuann/validation-env/lib/python3.6/site-packages/pyshacl/rdfutil/load.py", line 110, in load_from_source
    first_char = source[0]
IndexError: string index out of range

What am I doing wrong?

how do i validate a property which is an object?

I have a jsonld data file which looks something like this:

   { "@type": "ex:Activity",
      "schema:description": "example schema",
      "ui": { "order": [
                  "john",
                  "mark",
                  "lisy" ],
           "shuffle": false }

How do I write a constraint for the property ui since it is an object?

Resource of http://www.w3.org/ns/shacl#value is empty in validation report

In the report the resource found in a http://www.w3.org/ns/shacl#value is empty, see the json below.

[
  {
    ...
    "@type": [
      "http://www.w3.org/ns/shacl#ValidationResult"
    ],
    "http://www.w3.org/ns/shacl#focusNode": [
      {
        "@id": "http://vangoghmuseum.nl/data/artwork/d0005V1962"
      }
    ],
    ...
    "http://www.w3.org/ns/shacl#value": [
      {
        "@id": "_:N6087b61f1f1d44e08519420c185ba3f2"
      }
    ]
  },
  {
    "@id": "_:N6087b61f1f1d44e08519420c185ba3f2"
  },

This report is the result of a propertyShape with a sh:node constraint. The first validation result in the example contains the information of the shape containing the sh:node. This fine. The value (N6087b61f1f1d44e08519420c185ba3f2) should contain the information of the result for the sh:node. I confirmed this in TopBraid.

Debain/Ubuntu package for pySHACL

PySHACL is maturing and becoming an increasingly powerful and relevant tool for validating SHACL. I believe it is the go-to tool for SHACL validation on the commandline, and should be easily accessible for as many users as possible.

I want to get pySHACL packaged as a debian package and available from the official debian repositories, and in turn into Ubuntu repositories.

PySHACL has two dependencies, RDFLib and OWL-RL. RDFLib is already packaged and available in the debian repositories, so I need to get owlrl in too before I can package and publish a pySHACL debian package.

I've already submitted an ITP (Intent to Package) for both owlrl and pySHACL, to the Debian WNPP list.
I've created an Uploader account on the Debain Mentors site, so that I can request a sponsor to sponsor the package (to authorize it on my behalf) once the package is uploaded to the Mentors site staging area.

No module named 'pyldapi'

Hi,
This validator arrived just at the right time to enable more adoption of SHACL. Thank you for this effort.
I'm using a jupyter notebook running on python 3.6 and I installed the pyshacl module with:

!pip install git+https://github.com/RDFLib/[email protected]#egg=pyshacl

As suggested in the 'Use' section of the README file, I tried a basic validation by running:

from pyldapi import validate
validate(target_graph, shacl_graph, inference='rdfs', abort_on_error=False)

But I got a ModuleNotFoundError: No module named 'pyldapi'

I guess it's because the validate function seems to be part of the pyshacl module. I got it right by running :

from pyshacl import validate 
validate(target_graph, shacl_graph, inference='rdfs', abort_on_error=False)

Thanks.

Enable sh:pattern on IRIs

It is quite a common requirement to test an IRI to check if it is in a specific namespace, or contains a path element which is a specific character string or pattern. While the SHACL spec appears to restrict application of sh:pattern to string literals, it would be helpful to allow a 'relaxed' mode where it can also apply to IRIs (which are, after all, just a sequence of characters).

Note that the TopBraid SHACL engine (maintained by the SHACL editor @HolgerKnublauch ) does operate in this mode - see https://groups.google.com/forum/?utm_source=digest&utm_medium=email#!topic/topbraid-users/BUoROZt0BhM

Measurement of prevalence in the shacl-report

Hello,

is there any way to measure the prevalence of executed SHACL-tests, like getting the total number of instances of the sh:targetClass or a percentage like 0.95 of the instances of the given sh:targetClass fulfill the restrictions? If not I think it would be nice to have.

Best Regards

[Discussion] PySHACL Alternate Modes

PySHACL was originally built to be a basic (but fully standards compliant) SHACL validator. That is, it uses SHACL shapes to check conformance of a data graph, and gives you the result (True/False, plus a ValidationReport).
PySHACL does that job quite well. It can be called from python or from the command line, and it delivers the results users expect.

Over the last 12 months, I've been slowly implementing more of the SHACL Advanced Features spec, and pySHACL is now almost AF-complete.

The Advanced features add capability to SHACL which extends beyond that of just validating. Eg, the SHACL Rules allow you to run SHACL-based entailment on your data graph. SHACL Functions allow you to execute parameterised custom SPARQL Functions over the data graph. Custom Targets allow you to bypass the standard SHACL node-targeting mechanism and use SPARQL to select targets.

These features can use useful to execute validation in a more customisable way, but their major benefit is in the general use outside of just validating a data graph against constraints.

With these new features I see the possibility of PySHACL operating in additional alternative modes, besides that of just validating. Eg, expansion mode could run SHACL-AF Functions and Rules on the data graph, then return the expanded data graph (without validating).

Related to #20

SPARQL Target Select not working as expected

Hi,

we are trying to use SPARQL-based targets in our SHACL-Tests.
Our Test should use all non-anonymous instances of owl:Class as Focus Nodes, but it seems its not working:

The Test:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .


<#LODE-class-comment-violation>
    a sh:Shape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:select """
        SELECT ?this WHERE {
            ?this <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
            FILTER ( !isBlank(?this) )
        }
        """;
    ];
    sh:severity sh:Violation;
    sh:path rdfs:comment;
    sh:nodeKind sh:Literal;
    sh:minCount 1;
    sh:name "comment not correctly specified"@en;
    sh:message "rdfs:comment is missing or is no Literal"@en .

The result conforms as true with this ontology as data graph in pyshacl (with advanced=true in the validate function), but does not conform if we try the same in the shacl play service.
Is this a bug or did we miss something?

Thanks in advance,
Denis

validation with superclass constraints

pyshacl seems to ignore constraints defined on a superclass when validating an instance of a subclass. e.g given the SHACL

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.com/ex#> .

ex: a owl:Ontology ;
    rdfs:label "Example"@en ;
    rdfs:comment "Example"@en ;
    owl:versionInfo "" ;
    sh:declare [ sh:namespace "http://example.com/ex#" ;
            sh:prefix "ex" ] .

ex:Parent a rdfs:Class ;
    rdfs:isDefinedBy ex: ;
    rdfs:comment "The parent class"@en ;
    rdfs:subClassOf owl:Thing .

ex:ParentShape a sh:NodeShape ;
    rdfs:isDefinedBy ex: ;
    sh:property [
        sh:datatype xsd:string ;
        sh:path ex:name ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
    ] ;
    sh:targetClass ex:Parent .

ex:Child a rdfs:Class ;
    rdfs:isDefinedBy ex: ;
    rdfs:comment "The child class"@en ;
    rdfs:subClassOf ex:Parent .

ex:ChildShape a sh:NodeShape ;
    rdfs:isDefinedBy ex: ;
    rdfs:subClassOf ex:ParentShape ;
    sh:property [
        sh:datatype xsd:integer ;
        sh:path ex:age ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
    ] ;
    sh:targetClass ex:Child .

Validating a json-ld instance of Child that is missing the name property from Parent against the above SHACL

{
    "@context": {
        "@vocab": "http://example.com/ex#"
    },
    "@type": "Child",
    "age": 3
}

does not find a violation. I had expected that validating a subclass instance would also include constraints from the superclass.

Is this a misunderstanding of SHACL on my part or an issue with pyshacl?

Validator runtime error

Validator says there is a runtime error, no additional details turning on debug:

$ pyshacl -s 03-Network.ttl -e 03-Network.ttl sample-network.ttl
Validator encountered a Runtime Error.

Info:

$ python3.6
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyshacl
INFO:rdflib:RDFLib Version: 4.2.2
>>> print(pyshacl.__version__)
0.11.3.post1

Turtle files attached as text files.
03-Network.ttl.txt
sample-network.ttl.txt

Access to inferred triples

According to 8.4 General Execution Instructions for SHACL Rules implementations modify the data graph if triples get inferred, and/or may "construct a logical data graph that has the original data as one subgraph and a dedicated inferences graph as another subgraph, and where the inferred triples get added to the inferences graph only."

I've been following the Classification With SHACL Rules article and I would like to extract the graph of inferred triples which would include <http://bakery.com/ns#AppleTartC> a <http://bakery.com/ns#NonGlutenFreeBakedGood>, <http://bakery.com/ns#VeganBakedGood> . merged into the data graph or as a inference graph.

Is this feature available?

Unexpected violation when using sh:qualifiedMinCount and sh:qualifiedValueShape

pyshacl is giving an unexpected violation, one that I'm not seeing on the javascript https://shacl.org/playground/ (and pyshacl is also not showing the sh:message of the only property).

Data:

@prefix ex: <http://example.org/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Document
    a schema:Document ;
    schema:isTargetOf  [ a schema:HasAuthor ;
                         schema:isPresent true ] ;
    schema:isTargetOf  [ a schema:otherClass ;
                         schema:isPresent true ] ;
.

shacl constraints

@prefix dash: <http://datashapes.org/dash#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

schema:DocumentShape
    a sh:NodeShape ;
    sh:targetClass schema:Document ;
    sh:property [
        sh:message "At least one Author" ;
        sh:path schema:isTargetOf ;
        sh:qualifiedMinCount 1 ;
        sh:qualifiedValueShape [
            sh:class schema:HasAuthor ;
        ]
    ] ;
.

Python code used:

import rdflib
from pyshacl import validate
data_filename = "data/shacl/example_data_value.ttl"
data_graph = rdflib.Graph()
data_graph.parse(data_filename, format='n3')

constraints_filename = "data/shacl/shacl_constraints_value.ttl"
constraints_graph = rdflib.Graph()
constraints_graph.parse(constraints_filename, format='n3')

r = validate(data_graph,
            shacl_graph=constraints_graph,
            # ont_graph=og,
             inference='rdfs', abort_on_error=False,
             meta_shacl=False, debug=True, advanced=True)
conforms, results_graph, results_text = r
conforms

What I'm seeing in the terminal (note the absence of the sh:message)

$ python3 data/shacl/validate_transition.py
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
        Severity: sh:Violation
        Source Shape: [ sh:class schema:HasAuthor ]
        Focus Node: [ rdf:type rdfs:Resource, schema:otherClass ; schema:isPresent Literal("true" = True, datatype=xsd:boolean) ]
        Value Node: [ rdf:type rdfs:Resource, schema:otherClass ; schema:isPresent Literal("true" = True, datatype=xsd:boolean) ]

Guidance request

I've been happily using pyshacl (installed via pip3) to work on shacl rules, and in the process have broken my rules file in a manner I can't seem to correct, so was hoping you may have some tips for a newcomer ...

The error I get is simply

Validator encountered a Runtime Error:
Shape pointed to by sh:property does not exist or is not a well-formed SHACL PropertyShape.If you believe this is a bug in pyshacl, open an Issue on the pyshacl github page.

I don't believe it is a bug in pyshacl; the same two files in the shacl playground produce only VALIDATION FAILURE: Missing subject -- when I load the shacl into RDFlib and query, I can't find any sh:property triple with an unbound subject, but that may be a very naive approach.

run through meta-shacl shows nothing terribly helpful; unrelated things I know work in some engines such as using rdf:list items instead of spelling out a first/rest list.

In the -d debug trace, do the last few Constraint Report/Violations clues to the bad rule?
How might I get more information about what I've messed up in the shacl?

ConstraintLoadError: sh:namespace value must be an RDF Literal with type xsd:anyURI.

This may be related to the changes made for #59

Using the script below and the SHACL from http://datashapes.org/schema.ttl, I get the following error:

ConstraintLoadError: sh:namespace value must be an RDF Literal with type xsd:anyURI.
https://www.w3.org/TR/shacl/#sparql-prefixes

However, running pyshacl from the command line, appears to work correctly.

pyshacl -s ./schema_org_validation.ttl ./test_data.ttl

Validation Report
Conforms: False
Results (1):
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
	Severity: sh:Violation
	Source Shape: schema:CommunicateAction-about
	Focus Node: ex:asdgjkj
	Value Node: [ rdf:type sch:GameServer ; sch:playersOnline Literal("42", datatype=xsd:integer) ]
	Result Path: schema:about
	Message: Value does not have class schema:Thing

(I am not include the schema.org schema, hence the validation error)

Python script:

Archive.zip

import rdflib
from pyshacl import validate

data = """
@prefix ex: <http://example.org/> .
@prefix sch: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:asdgjkj a sch:CommunicateAction ;
    sch:about [ a sch:GameServer ;
            sch:playersOnline "42"^^xsd:integer ] .
"""

dataGraph = rdflib.Graph().parse( data = data, format = 'ttl' )
print( dataGraph.serialize( format='ttl' ).decode( 'utf8' ) )

shaclData = open( "./schema_org_validation.ttl", "r" ).read()
shaclGraph = rdflib.Graph().parse( data = shaclData, format = 'ttl' )

report = validate( dataGraph, shacl_graph = shaclGraph, abort_on_error = False, meta_shacl = False, debug = False, advanced = True, do_owl_imports = True )

print( report[2] )

FocusNode and ValueNode in ReportGraph should be able to point to same BNode

Related to #55
In building the report graph, when a valueNode and focusNode are both the same and are a Blank Node, the validation result's valueNode and focusNode will never have the same ID in the Report Graph. In the current implementation they are both copied over separately (thus they are two new blank nodes), but I could put a simple check if they are the same node, only copy it into the report graph once, then use that for both valueNode and focusNode, then they will have the same ID.

Feature Request: Variables in validation reports of SPARQLConstraintComponent

I want to request a feature for validation reports of SPARQLConstraint(Component) as described in the SHACL Recommondations.
I'd really appreciate the possibility to use variables from SELECT queries in the sh:message, i.e.:

:VerifyPowerAdapterSupplyShape
  a sh:NodeShape ;
  sh:targetClass ex:Computer ;
  sh:sparql [
    a sh:SPARQLConstraint ;
    sh:message "The power adapter ({?availablePower} W) must provide more power than the parts of the computer consume ({?requiredPower} W)." ;
    sh:prefixes ex: ;
    sh:select """
      SELECT $this ?availablePower ?requiredPower
      WHERE {
        $this ex:hasPowerAdapter ?powerAdapter .
        ?powerAdapter ex:hasPowerSupply ?availablePower .
        {
          SELECT (SUM(?power) as ?requiredPower)
          WHERE { 
	    $this ex:hasPart ?device .
	    ?device ex:hasRequiredPower ?power .
          }
        }
        FILTER(?availablePower < ?requiredPower) .
      }
    """ ;
  ] .

That would help me alot to debug ontologies that rely on complex SHACL-SPARQL constraints :)

validation with sh:closed

Given the Shapes Graph:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.com/ex#> .

ex:Parent a rdfs:Class ;
    rdfs:isDefinedBy ex: ;
    rdfs:comment "The parent class"@en ;
    rdfs:subClassOf owl:Thing .

ex:ParentShape a sh:NodeShape ;
    rdfs:isDefinedBy ex: ;
    sh:property [
        sh:datatype xsd:string ;
        sh:path ex:name ;
        sh:maxCount 1 ;
        sh:minCount 1 ;
    ] ;
    sh:closed true ; 
    sh:ignoredProperties ( rdf:type ) ;    
    sh:targetClass ex:Parent .

and the Data Graph:

{
    "@context": {
        "@vocab": "http://example.com/ex#"
    },
    "@type": "Parent",
    "name": "Father",
    "dummy": "Dummy value"
}

I expect to see a sh:ClosedConstraintComponent validation failure because of (ex:ParentShape sh:closed, true) in the Shapes Graph and the presence of the property "dummy": "Dummy value" in the Data Graph.

However, using the pyshacl (0.9.5) validate function no such validation failure is generated. Instead the text result is:

Validation Report
Conforms: True

In http://shacl.org/playground/ the expected validation failure is produced.

validation showing true inspite of errors in data shape

if Data graph is this:

`{

"@context": { "@vocab": "http://schema.org/" },
"@id": "http://example.org/ns#Bob",
"@type": "Person",
"givenName": "Robert",
"familyName": "Junior",
"birthDate": "1971-07-07",
"deathDate": "1968-09-10",
"address": {
    "@id": "http://example.org/ns#BobsAddress",
    "streetAddress": "1600 Amphitheatre Pkway",
    "postalCode": 9404
}

}`

and Shapes Graph this :

@Prefix dash: http://datashapes.org/dash# .
@Prefix rdf: https://www.w3.org/1999/02/22-rdf-syntax-ns# .
@Prefix rdfs: https://www.w3.org/2000/01/rdf-schema# .
@Prefix schema: http://schema.org/ .
@Prefix sh: https://www.w3.org/ns/shacl# .
@Prefix xsd: https://www.w3.org/2001/XMLSchema# .

schema:PersonShape
a sh:NodeShape ;
sh:targetClass schema:Person ;
sh:property [
sh:path schema:givenName ;
sh:datatype xsd:string ;
sh:name "given name" ;
] ;
sh:property [
sh:path schema:birthDate ;
sh:lessThan schema:deathDate ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path schema:gender ;
sh:in ( "female" "male" ) ;
] ;
sh:property [
sh:path schema:address ;
sh:node schema:AddressShape ;
] .

schema:AddressShape
a sh:NodeShape ;
sh:closed true ;
sh:property [
sh:path schema:streetAddress ;
sh:datatype xsd:string ;
] ;
sh:property [
sh:path schema:postalCode ;
sh:or ( [ sh:datatype xsd:string ] [ sh:datatype xsd:integer ] ) ;
sh:minInclusive 10000 ;
sh:maxInclusive 99999 ;
] .

when I do this:
pyshacl -s /path/to/shapesGraph.ttl -m -i rdfs -a -f human /path/to/dataGraph.json-ld -df json-ld

why doesn't it show validation errors? (as we can clearly see there is error in address and birthDate in the data graph)

Support for recursion

According to the SHACL spec:

The validation with recursive shapes is not defined in SHACL and is left to SHACL processor implementations.

I was wondering if pySHACL has any plans to support recursive shapes or does it?

Add new option for passing in an ontology specification document

It can sometimes (often) be the case that the combination of the SHACL Shape file and the Data File together do not give the pySHACL validation engine enough information to generate a correct validation result, even if inferencing is run across the input data file.

For example:

  1. I have a shape file which asserts that for all instances of the class Human, if they have a property called hasPet, the target object of that property must be an instance of the class Animal.

  2. I have a data file containing statements:

  • Person1 Instance of Human named "Amy", she has a property hasPet with the target Pet1.
  • Pet1 Instance of Lizard named "Sebastian"

If I run the validator across those inputs, it will return a validation result indicating failure because the pet is not of type animal. Even if inferencing is run on the data file, there is no way for the validator to know that Lizard is a subclass of Animal, so the validation still returns the result.

In order for this validation to work, there needs to be a statement of (Lizard, rdfs:subclassOf, Animal) included in the data file before submitting it to the validator, and basic RDFS inferencing must be run on the data graph before validating, to ensure the (Pet1, rdf:type, Animal) triple is created in the data graph.

This is a very simple example but hopefully highlights the problem faced, where any extra ontological information required for inferencing needs to be added into the data file before passing it to the data file. This is inconvenient because in most practical applications of pySHACL, the data file is an isolated data snippet, without any accompanying ontological information.

It is sometimes the case that extra ontologicial information is added into the SHACL Shape file, or indeed that the SHACL Shapes are included as part of an ontology document itself. This does not help in this situation, because the file passed into the validator and parsed into the SHACL Shapes graph does not get mixed into the data graph, so those extra ontological statements do not take effect in the inferencing step on the data graph (and inferencing is never applied to the SHACL graph).

I propose an extra feature for pySHACL where you can optionally specify the location to an extra static ontology document, which gets ingested and mixed into the data graph prior to the inferencing step.

This will be a new feature in the python module, and exposed as an option on the command line tool, and as an optional field on the web tool.

Monolithic file generates report, split .ttl files do not.

I have a gist with all of the relevant files at: https://gist.github.com/James-Hudson3010/2588d9b17dd33e15922122b8b5cf1bd7

If I execute:

$ pyshacl -a -f human employees.ttl

I get the following, correct validation report...

Validation Report
Conforms: False
Results (3):
Constraint Violation in MaxInclusiveConstraintComponent (http://www.w3.org/ns/shacl#MaxInclusiveConstraintComponent):
	Severity: sh:Violation
	Source Shape: hr:jobGradeShape
	Focus Node: d:e4
	Value Node: Literal("8", datatype=xsd:integer)
	Result Path: hr:jobGrade
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: hr:jobGradeShape
	Focus Node: d:e3
	Value Node: Literal("3.14", datatype=xsd:decimal)
	Result Path: hr:jobGrade
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: hr:jobGradeShape
	Focus Node: d:e2
	Result Path: hr:jobGrade

However, if I split employees.ttl into three files containing the schema, shape, and instance data and run:

pyshacl -s shape.ttl -e schema.ttl -a -f human instance.ttl

the result is:

Validation Report
Conforms: True

I assume I am calling pyshacl correctly.

super(type, obj): obj must be an instance or subtype of type

Updated the library to pull in some recent changes, and ran into this error: super(type, obj): obj must be an instance or subtype of type. The function below was running fine before the update. Any idea what could be causing this?

try: 
    conforms, v_graph, v_text = validate(places, shacl_graph=places_shape,
                                     data_graph_format=data_file_format,
                                     shacl_graph_format=shapes_file_format,
                                     inference='rdfs', debug=True,
                                     serialize_report_graph=True)
    print(conforms)
    
except Exception as e:
    print(e)
    pass

Command-line use does not work in Windows

The path is not getting interpreted correctly:
file://c:\my\full\path\test.ttl/ does not look like a valid URI, trying to serialize this will break.

I've tried with forward slashes, backslashes, no slashes (all files in current directory), full filespec (with and without c:), etc. Couldn't get any of them to work.

Windows 10.

Enforcing minimum number of instances doesn't work

Following the trick mentioned in https://www.w3.org/wiki/SHACL/Examples i wanted to write a shape to validate the existence of a node.

The shape

{
    "@context": {
       "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
       "sh": "http://www.w3.org/ns/shacl#",
       "schema": "http://schema.org/"
    },
    "@graph": [
        {
            "@id": "_:forceDatasetShape",
            "@type": "sh:NodeShape",
            "sh:targetNode": "schema:DigitalDocument",
            "sh:property": [
                {
                    "sh:path": [
                        {
                            "sh:inversePath": [{
                                "@id": "rdf:type",
                                "@type": "@id"
                             }]
                        }
                    ],
                    "sh:minCount": 1
                }
            ]
        }
    ]
}

with the graph

{}

throws a validation error in the SHACL playground https://shacl.org/playground/

But pyshacl says that it's conforming. Does this inversePath trick not work with pySHACL?

Command I'm using is: pyshacl -a -m -s shape.json graph.json -sf json-ld -df json-ld
With pyshacl version 0.11.3


On a side note, SHACL playground validates successfully with

{
    "@context": { "schema": "http://schema.org/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#" },

    "@id": "http://example.org/ns#Bob",
    "rdf:type": "http://schema.org/DigitalDocument"
}

but not with

{
    "@context": { "schema": "http://schema.org/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#" },

    "@id": "http://example.org/ns#Bob",
    "@type": "http://schema.org/DigitalDocument"
}

or

{
    "@context": { "schema": "http://schema.org/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#" },

    "@id": "http://example.org/ns#Bob",
    "rdf:type": "schema:DigitalDocument"
}

which is weird. I was under the impression that @type is an alias for rdf:type.

shacl advanced features

Hi,
I'm just trying to clarify if pySHACL support advanced features such as sh:target, sh:filterShapeNode etc? Looks like it doesn't support those properties currently..

Thanks,
Yi

pySHACL for yaml?

hey RDFlib! I'm working on validation functions for schema.org content, and specifically we have yaml (and frontend matter of html) definitions of specifications (that load nicely into JSON). I'm wondering if there would be some logical way to use pySHACL to validate these inputs? See our discussion here --> schemaorg/schemaorg#2069 (comment) and here is an example input with yaml as frontend matter (that can be loaded as json of course). I'm also wondering if there is development space to be able to define tests / criteria in yaml, since this is the current language of many continuous integration services like Travis, Circle, etc. I started some thinking about this but before implementing something new, wanted to check with what standards are used in the community. Generally the criteria I am looking for are:

  • Python based (for easy use by the scientific community)
  • For the same reason, yaml or json-ld (but probably not rdf natively)
  • simple in that it doesn't have extra dependencies beyond what is already used

Thanks for your feedback! Please join in on the first issue listed above if you have thoughts! I'm very happy to contribute something here (with guidance) or to create a simplified version that goes from a yaml criteria to a validated specification.

PySHACL considers non-conforming datagraph to be conforming

The attached files illustrate the behavior I am seeing. The W3C validator (https://shacl.org/playground/) does flag this as non-conforming, so I am trusting this is not operator error on my part.

The shacl graph includes a property shape defined as follows:

 ex:Func  a       owl:Class , sh:NodeShape ;
        rdfs:label       "Func" ;
        rdfs:subClassOf   ex:Function ;
        sh:property      [ a         sh:PropertyShape ;
                           sh:class   ex:FuncParam_Func_a ;
                           sh:path    ex:hasParameter ;
                           sh:minCount 1;
                           sh:name	 "Func_a"
                         ] .

and the graph being validated includes

test:FuncNode	a	ex:Func;
	ex:hasParameter test:FuncParam_b .
	
test:FuncParam_a	a	ex:FuncParam_Func_a .
test:FuncParam_b	a	ex:FuncParam_Func_b .

simpleOnto.zip

Regular expression in sh:pattern not processed correctly

I have the following:

graph_data = """
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sch:  <http://schema.org/> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix ex:  <http://example.org/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

ex:JohnDoe a ex:XXXX .
ex:JohnDoe ex:name "hello.txt" .
"""

shape_data = """
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sch:  <http://schema.org/> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix ex:   <http://example.org/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

ex:PersonShape
  a sh:NodeShape ;
  sh:targetClass ex:XXXX ;
  sh:property ex:PersonShape-name .

ex:PersonShape-name
  a sh:PropertyShape ;
  sh:path ex:name ;
  sh:minCount 1 ;
  sh:pattern  ".*.txt" .
"""
        
data  = rdflib.Graph().parse( data = graph_data, format = 'ttl' )
shape = rdflib.Graph().parse( data = shape_data, format = 'ttl' )

print( f"{data.serialize( format = 'ttl' ).decode( 'utf8' )}" )

report = validate( data, shacl_graph=shape, abort_on_error = False, meta_shacl = False, debug = True, advanced = True )

print( report[2] )

The sh:pattern should be ".*\.txt", but when I do that, the following errors are generated:

... notation3.py", line 1591, in strconst  "bad escape")
... notation3.py", line 1615, in BadSyntax  raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg)

  File "<string>", line unknown
BadSyntax

At least according to http://www.datypic.com/books/xquery/chapter19.html, I am using the escape correctly.

Could not install pyshacl using pip

I ma getting the following error when I try to install pyshacl

Could not find a version that satisfies the requirement RDFClosure (from pyshacl) (from versions: )
No matching distribution found for RDFClosure (from pyshacl)

CLI: -m option produces an exception

When using pyshacl with -m option, pyshacl reports a traceback about a ValueError: read of closed file.

### pyshacl -m -s shape.ttl data.ttl
Traceback (most recent call last):
  File "/usr/local/bin/pyshacl", line 71, in <module>
    is_conform, v_graph, v_text = validate(args.data, **validator_kwargs)
  File "/usr/local/lib/python3.7/site-packages/pyshacl/validate.py", line 194, in validate
    rdf_format=shacl_graph_format)
  File "/usr/local/lib/python3.7/site-packages/pyshacl/util.py", line 176, in load_into_graph
    data = target.read()
ValueError: read of closed file

shape.ttl and data.ttl are valid files with valid shapes and RDF data.
When using pyshacl -s shape.ttl data.ttl- so without -m - pyshacl works as expected.

cannot import name 'convert_graph' from 'owlrl'

Hey out there.
I get a really confusing error while trying to set up pySHACL.

I just try to import from pyshacl import validate in a python script and get the following error.

Traceback (most recent call last):
  File ".\shaclCheck.py", line 1, in <module>
    from pyshacl import validate
  File "C:\Python37x64\lib\site-packages\pyshacl\__init__.py", line 3, in <module>
    from pyshacl.validate import validate, Validator
  File "C:\Python37x64\lib\site-packages\pyshacl\validate.py", line 5, in <module>
    import owlrl
  File "C:\Python37x64\Scripts\owlrl.py", line 4, in <module>
    from owlrl import convert_graph, RDFXML, TURTLE, JSON, AUTO, RDFA
ImportError: cannot import name 'convert_graph' from 'owlrl' (C:\Python37x64\Scripts\owlrl.py)

I think i did set up all Path variables related to the packages but i dont get this error fixed.

My System is a Win10 set. I dont know if this does cause the problem?
Can somebody help me here?

Best regards

Output with anonymous focus nodes, could it print whole node?

I'm doing some validation in data with anonymous nodes. The output of validate (e.g., results_text) shows:

Focus Node: [ ]
Value Node: [ ]

That makes it pretty hard to know which node has the issue. Any thoughts on how to identify them. I'm wondering if there could be an option to print either the datafile:line_number (hard, I know) or perhaps the whole anonymous node (ours are small, I know they could be very big, but even a few lines would probably help locate them).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.