Validating IPTC Sport Schema files with SHACL
We have constructed a SHACL Shapes file that describes how IPTC Sport Schema files should be constructed and warns users if they have created an invalid RDF graph. It can be used to validate that a set of triples is valid IPTC Sport Schema data.
Why SHACL?
RDF Schema, the technology we have used for the ontology file, describes how the classes and properties relate to each other, but by definition RDF Schema does not specify whether a triple is “valid” or not - for example, misspelling a property name is not an error under RDF Schema. In this way, RDF Schema is different from XML Schema or JSON Schema. It doesn’t tell users whether a given set of instance data is “right” or “wrong” agains the schema.
Using OWL instead of (or as well as) RDF Schema doesn’t solve this problem, as OWL works the same way (but with a more expressive ontology definition language).
The W3C’s recommended way of validating sets of triples is https://www.w3.org/TR/shacl/[SHACL - the Shapes Constraint Language]. SHACL is designed to function as a validator for RDF graphs.
The SHACL Shape for IPTC Sport Schema
Our SHACL file (or “shape”) is located in this repository at ontologies/iptc-sport-shacl.ttl
.
Currently it contains rules for most classes and properties, but not everything.
How to validate triples against SHACL on the command line
The Jena command-line tools can be used to validate triple sets against the SHACL Shapes file.
For example:
% shacl validate --shapes ontologies/iptc-sport-shacl.ttl --data samples/ttl/soccer-match-01.ttl
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[ rdf:type sh:ValidationReport ;
sh:conforms true
] .
Currently, our SHACL Shape constrains several properties to use only terms from our IPTC NewsCodes controlled vocabularies, through the use of regular expressions.
This means that the following Turtle would trigger a validation error:
<http://example.com/Participation/WC-2017-e.958051-T280>
a sport:TeamParticipation ;
sport:eventOutcome "draw" . # invalid
Only a term from the controlled vocabulary would validate:
<http://example.com/Participation/WC-2017-e.958051-T280>
a sport:TeamParticipation ;
sport:eventOutcome <http://cv.iptc.org/newscodes/speventoutcome/tie> . # correct
To run the SHACL validator over all sample files in the repository, we have written a small shell script that helps:
tools/shacl-validate.sh