[foaf-dev] document-based property qualification - adding detail to RDF

Dan Brickley danbri at danbri.org
Thu Jan 20 09:21:34 CET 2011


This is an elaboration of a sketch I made in #swig IRC yesterday.
Damian challenged me to compress it to 140 chars rather than make it
longer, so I'll try that first before the horribly long detailed
version.

IRC log: http://chatlogs.planetrdf.com/swig/2011-01-19.html#T14-21-38

Short version: Many RDF properties are vague. Rather than create lots
of detailed sub-properties, let's catalogue some meaningful contexts
in which they can appear. (150 chars)


Longer: thinking through when we should define new RDF vocabulary,
versus work at the level of provenance, claims and document types.

I recently wrote that RDFa's syntax makes it easier to mention several
common properties that connect a couple of things
(http://danbri.org/words/2010/11/02/572). For example, you might say
of someone that they have a foaf:interest in <http://www.w3.org/XML/>,
and also that they are xyz:availableAsConsultant regarding
<http://www.w3.org/XML/>. Or regarding
<http://en.wikipedia.org/wiki/Croatian_language> I might want to
express both a foaf:interest, and that I'm (fictional property)
foaf:activelyLearning it.

(Aside: let's not get distracted here by the rathole of discussing
indirect identification via docs and skos versus 'the things
themselves')

RDF comes with the notion of rdfs:subPropertyOf which allows us to
document patterns of meaning amongst properties. We could say that
xyz:availableAsConsultant and the fictional foaf:activelyLearning
properties are both rdfs:subPropertyOf foaf:interest. And the RDFa
syntax mentioned above allows us to use both at the same time, which
is friendly to consumers who don't know newer or more obscure (but
precise) terms.

That is a direction that sets us on the path towards having a family
tree of related properties, linking vague but widely used ones with
more precise and specific niche usage. This is probably good, but it
can also be quite fragmenting, particularly as many RDF query systems
don't yet 'understand' sub-property hierarchies by default.

I am trying to think through an alternative deployment pattern.
Instead of endlessly qualifying our descriptive terms, could it make
sense to create names for the document contexts in which these
general-purpose terms are used? Might that sometimes be more natural
and flexible?

So, imagine I have my main FOAF self-description (we call these
'personal profile documents', and FOAF even has a class named
foaf:PersonalProfileDocument). That might mention a few of my
interests using foaf:interest. But it doesn't always make sense to put
everything about me in a single document. So alongside that we could
have books-i-own.rdf page which describes a collection of books that I
own. It would have some markup about the books, maybe some kind of
ownership link to me, and for each of them it might assert a
'foaf:interest' link between me and the book. And we might also have a
'things-that-pages-I've-bookmarked-or-shared-are-about.rdf (or .html /
rdfa) document. In that, we could have a huge list of URLs for
documents I've put on delicious or LIKEd on Facebook, plus information
from entity-extractors that list the things associated with those
pages. Now a machine can't really guess how interested I am in those
things, but we could still use foaf:interest to link me to them. Or a
third page, things-I-am-actively-studying (.rdf / .html), which lists
some current learning interests. Any of those scenarios could be made
more precise: not books that I own, but those I'm reading. Not links I
shared, but links that I "liked".

The sketch here is that we can imagine a rich diverse set of
subclasses of 'Document' capturing scenarios like this. And those
scenarios wouldn't each require yet another new subproperty of the
broad, general purpose foaf:interest property. And it wouldn't require
that other common RDF pattern, which is taking a simple triple-based
claim and re-structuring it as an n-ary relationship so it can be
qualified with extra fragments of info. (We can call that lowercase-r
reification, since it can be attempted either with the old RDF
reification vocabulary or with custom vocabulary).

This alternate approach comes with costs. It forces consuming apps to
either miss out on the subtle detail (and just load up the relevant
RDF into a single flat set of triples). Or else to operate at a new
level of abstraction where we deal more with quads and hypertext and
notions of authority.

Example:
Let's take foaf:expertise as a fictional example. Imagine it is a
subproperty of foaf:interest, and is defined the same way, so it
allows expertise (and therefore interests) to be defined indirectly,
by citing a document on that topic. The idea is, instead or as well as
creating such a property, we give a name to a type of Document in
which all the 'foaf:interest' claims are also expressing expertise.
That could be (but need not) made machine-understandable; or it could
just be expressed in natural language. This gives us two patterns:

pattern (A):
Classic RDF property qualification. In my homepage I could write (in a
paragraph about me),
<div about="#me" typeof="foaf:Person">
 I <a rel="foaf:interest foaf:expertise"
href="http://www.w3.org/XML/">have expertise in XML</a>
</div>

pattern (B):
Document-based qualification. In my homepage, I link to a second document

homepage.html:
<div about="#me" typeof="foaf:Person">
  Blah blah blah. See my <a rel="rdfs:seeAlso"
href="expertise.html">stuff I know</a> page for more details.
</div>

expertise.html
<div about="expertise.html" typeof="foafwiki:ExpertiseDoc"/>
<div about="#me" typeof="foaf:Person">
 I <a rel="foaf:interest" href="http://www.w3.org/XML/">have expertise
in XML</a>
</div>


The granularity *is* awkward, because RDF's basic structures are
triple-centric, and the only way currently to easily group a set of
triples into a named graph is by putting them in separate documents.
But there are proposals like http://buzzword.org.uk/2009/rdfa4/spec
and likely upcoming work at W3C which might make this easier.

Does it have any advantages? I'm not yet sure.

The appeal to me is that it takes some pressure off of simple
sub-property refinement, and encourages us to pay attention not only
to who asserted some claim, but the context in which they did so. RDF
generally encourages decontextualisation, and that is valuable but it
also loses a lot of information. We are left always asking ourselves
questions like "so, am I *really* interested in something just because
I bookmarked it?", "how do I express the difference between a lifelong
fascination in a topic, and something I just stumbled across?". RDF's
existing patterns for making those distinctions lead us towards
ever-more complex dictionaries of terms. By putting some information
at the document/graph typing level instead, I wonder whether this
might help keep the vocabulary landscape uncluttered and therefore
easier to learn?

As I said in IRC, I'm not completely convincing myself, but wanted to
get this written down. How would it look in practice? Taking (A) and
(B) above:

For (A), to find which interests someone claimed expertise in, you
would need to know about the relevant property (foaf:expertise), you
would need to load up a set of self-asserted triples about that
person, and then you'd query for those that expressed foaf:expertise
links.

For (B), to find which interests someone had expertise in, you would
need to know about the document class foafwiki:ExpertiseDoc, and find
one that they claimed to have written. You'd then query it for those
that expressed foaf:interest links.


(A)'s definition of foaf:expertise would be something like "Something
that indicates an area of expertise of some agent (subproperty of
foaf:interest)".

(B)'s definition of foafwiki:ExpertiseDocument might be, "A document
about a particular agent, which lists some of their areas of
expertise, indicated via foaf:interest."

Note that B forces a change in perspective; we're not just thinking
about the terms used to make some claim, but about named
situations/contexts in which claims are made. This extends very
naturally to identifying the parties that make those claims; perhaps
that is the reason why it appeals to me.

This design pushes the complexity out of the basic RDF vocabulary
terms and into a hypertext / quads structure. Whether that is a good
home for it, I don't know...

cheers,

Dan


More information about the foaf-dev mailing list