[rdfweb-dev] Schemarama revisited
danbri at w3.org
Mon Aug 11 23:25:56 UTC 2003
* Victor Lindesay <victor at vicsoft.co.uk> [2003-08-11 22:57+0100]
> You seem to be painting a picture of a RDF as a way to represent data
> and exchange data that's so loose that any form of validation is a waste
> of time. In the real world that means unreliable and unusable. No wonder
> RDF has slow take up.
> I thought that RDF had RDF schemas.
RDF schemas and XML schemas work differently.
XML schemas are all about the rules for whether some chunk of wellformed
XML counts as being a particular kind of XML document (ie. document
typing). They're all about those things in the world that are XML documents.
RDF schemas are about everything else.
When you see 'ShippingOrder', 'Address', 'Person' etc. in an XML schema,
the schema isn't telling you about shipping orders, addreses, or
persons; it's telling you about a particular XML data format for
describing such things. And it gives you rules for figuring out when a
chunk of XML is so borken (eg. wrong tag structure, or missing info) it
couldn't possibly be a sensible description.
RDF schemas make claims couched in terms of a (cartoonified)
representation of the world we're describing in our XML documents. They
say things like "all People are Mammals"; "'livesNear' is a relationship
that holds between people and places"; "if something is the rss1:title of
something it is also the dc:title", and so on. They describe patterns of
constraints about true descriptions of the world, rather than mandating
particular tagging structures within XML documents that describe the
RDFS alone is pretty weak, you can't even contradict yourself, and hence
can't do much to check RDF data for obvious screwups. So we've boosted
the expressive power of RDF by creating OWL, the "Web Ontology
Language". OWL has all sorts of machinery for making more sophisticated
claims about the world. In OWL, you can say things like: "Nothing can be both a
Person and a Document, as those classes are mutually disjoint";
"foaf:depiction and foaf:depicts are inverses, if you see ?x and ?y
related by the one, you can infer the inverse"; "Something is a W3CTeamPerson
if it is a Person AND it has a foaf:workplaceHomepage of http://www.w3.org"...
I could go on with OWL examples, since there's a lot else it can do, but
the point is that it describes constraints about the world, not about
XML documents. OWL, unlike RDFS alone, gives you plenty of ways to
contradict yourself, and hence to produce machine-checkably bad data.
If you're wondering whether there's a gap where a proper explanation of the
relationship between RDF and XML schema languages should be, you're right.
My 'FOAF contradictions' writeup at
http://rdfweb.org/mt/foaflog/archives/000040.html is possible fodder
towards this end, as is this brief note.
They should be complementary pieces of the puzzle. I believe there are
compelling arguments for working at the RDF level, but we still lack
certain things at the RDF level that we enjoy when working at the XML
level. (Similarly, I sometimes work with XML content using 'grep' level tools).
In particular, we lack a way for RDF people to talk about the
expected information payload of a particular RDF/XML document, or class
of documents. It's all very well RDF/RDFS/OWL allowing us to say "Human beings
have two parents", but what if we want to describe a document format
which demanded that each person-description included the full name of
both parents of each person mentioned. That's to my mind where we lack
both machine-friendly and human-friendly conventions for describing our
expectations about what a particular class of documents will tell us.
Libby and I did a little work in this area a couple of years ago,
inspired by the XPath-based Schematron system by Rick Jelliffe and a
big XML-DEV thread on schema language pluralism (nicely written up
by Leigh Dodds at http://www.xml.com/lpt/a/2001/02/07/schemarama.html).
Our experiment was named 'Schemarama' after the schema pluralism
debate and in tribute to Rick's elegant system. The basic idea was to
find a way of re-using RDF query technology to express constraints on
the expected content of certain kinds of documents. The way we actually
did this was conceptually the same as Schematron's, but implemented
(in a rough'n'ready manner) on top of Squish (an RDF query language)
instead of on XPath.
http://ilrt.org/discovery/2001/02/schemarama/ points to the demos, which
still (Libby, I never doubted for a second ;) still to work.
The Jobs Schemarama example (linked from there) may be of interest, since that
usage scenario has cropped up again recently in RSS/Atom discussions.
What we try to show there is an RDF-based way of asserting that, in
our particular document format (or workflow context(*)) we want to be
able to find a match for each of:
(job::advertises ?item ?job)
(job::title ?job ?title)
(job::salary ?job ?salary)
(job::currency ?job ?currency)
(job::orgHomepage ?job ?orghp)
...wherever the graph has a thing of type rss::item.
One other thing worth mentioning here, and that I admire very much about
Rick's Schematron system is that it de-couples data checking from
vocabulary definition. With Schematron (and Schemarama, our version) you
get to express rules about how you want your XML documents to be. But
you don't mix that up with the task of defining the concepts and terms
which your XML documents use to describe the world. This makes a lot of
sense to me since it allows those terms to be used freely across various
classes of XML document, without their only being one set of rules for
their use. I should also mention we did the Schemarama work before OWL
and DAML+OIL were really on the scene, and it probably needs re-thinking
in the light of the facilities offered by OWL.
Yes, there's a need to characterise for machines and humans our
expectations about the information payload of XML documents, and to do
machine checking of that data. But... this doesn't mean RDF is inherently
too permissive to achieve this. A combination of contradiction detection
(OWL) and Schematron-like
testing of document contents (eg. Schemarama) we have a fair toolset to
play with. On top of that, it is possible to write XML schemas (W3C XML
Schema or RELAX-NG etc) which use a more traditional approach to saying
what a doc will contain (in terms of element/attribute trees), yet whose
instance data also fits the rules of RDF/XML syntax. There's plenty in
the tool cupboard, we just have our work cut out figuring a story for
hooking it all together...
More information about the foaf-dev