[rdfweb-dev] ID, nodeID and foaf:made

Dan Brickley danbri at w3.org
Sat Aug 2 09:56:14 UTC 2003

Fair questions. Let me try to answer briefly.

rdf:ID dates from the original RDF spec, first drafted in 1997 when XML itself 
was in flux, XML namespaces barely existed, and DTDs were state of the art.

One constraint we had on rdf:ID was that it acted like an XML ID attribute,
even though formally we couldn't say it was one since RDF didn't require 
DTD processing. In particular, you can only have one attribute with any 
given rdf:ID value within your RDF/XML document. The idea was that these 
were used for linking _to_ things from elsewhere in the Web. An RDF parser,
given some document (that has a base URI) will generate full URIs for a 
node whose XML element is decorated with an rdf:ID, so for example:
<rdf:Description rdf:ID="me">
 <foaf:name>Dan Brickley</foaf:Name>
...if parsed with a base URI of http://example.com/foaf/test1.rdf
...will generate a single triple:

http://example.com/foaf/test1.rdf http://xmlns.com/foaf/0.1/name "Dan Brickley"

Simple enough. So where did this rdf:nodeID thing come from?

Well the basic problem was that 'original RDF', ie. the thing begun in 1997
and made into a W3C RECommendation in Feb 1999, was a bit rough around the 
edges. A few things weren't clear, for example the notion of so-called 
"anonymous resources", which we now refer to as "blank nodes" (or bNodes)
in the graph. These correspond to RDF descriptions of things where the 
thing (a person, document, company, whatever...) is _mentioned_ yet not 
_named_ by specifying a full (or even partial) URI. Aside: we stopped 
using the term "anonymous resource" in acknowledgment that the thing being
described isn't intrinsically anonymous; it may well have a widely known
URI, it is just that some RDF files can mention it 'in passing' without 
giving that URI (ie. rdf:about or rdf:resource might not be used).

So... if you have an RDF graph, and it mentions a bunch of things, and 
several of those things don't have URIs attributed to them in the graph, 
ie they are blank nodes in the graph, then you can get a problem. It isn't 
always possible to write down in '99-era RDF/XML the markup that 
represents (serializes) that data structure losslessly. You end up inventing
URIs for things so you can fit them into the constraints of the RDF syntax.

For example, say you have three people. And assume the world is still
squabbling away about angels on pinheads and whether people have URIs, so there
is no concensus about whether people 'have' URIs. But you still want to 
describe them in RDF. So you make an RDF graph with bNodes for the people.

let's label our 3 people 'a','b' and 'c', noting that these are private, 
local, transitory etc IDs we're using just so we can talk about them. They're 
not web-wide IDs that we expect to be widely known (such as URIs).

a worksFor b
a marriedTo c
b fatherOf c

This mini-web of relations can't be serialized in '99-era RDF/XML without
inventing URIs for these 3 things, ie. the mere syntax of RDF used to force 
us to screw around with our data, and change (albeit only slightly) what
the data was telling us.

  <name>person a</name>
      <name>person b</name>
           <name>person c</name>
   <marriedTo ... -- what do we write here to link to person c! />

So this is just the classic problem of mapping between two data structures,
ie. directed labeled graphs (RDF) to trees (XML).

We need a way of crossreferencing the XML element that stands for a to 
the one that stands for c, and labelling that relationship with 'marriedTo'.

So first let me show you the way that RDF'99 would have us do it. Note the 
asymmetry: we do something different at each end of the link.

  <name>person a</name>
      <name>person b</name>
         <Person rdf:ID="c">
           <name>person c</name>
   <marriedTo rdf:resource="#c"/>

Hopefully you can see an analogy with the old style of HTML linking:
the link goes 'from' the marriedTo element, 'to' the Person element.
This is like an 'a href' pointer to an 'a name' anchor target in HTML.

Since rdf:ID expands to a full URI, and rdf:resource also expands relative 
URIs to full URIs, what we have here is just shorthand for this:

  <name>person a</name>
      <name>person b</name>
         <Person rdf:about="http://example.com/whateverthisdociscalled#c">
           <name>person c</name>
   <marriedTo rdf:resource="http://example.com/whateverthisdociscalled#c"/>

...and the RDF triples you get back from an RDF parser reflect this. The 
descriptions of 'a' and 'b' will generate blank nodes in the graph, ie. 
we get RDF which says, in effect, 

	"there is a thing and we don't know its URI but 
	anyway that thing has a works for relationship to another thing that 
	we don't know the URI for either but that thing has a fatherOf 
	relationship to yet another thing which is named by the URI
	and the first thing has a marriedTo relationship to the thing 
	whose URI is http://example.com/whateverthisdociscalled#c".
	(I've ommitted the 'foaf:name' properties here for brevity).

So, where does that leave us?

Well, firstly we have the problem of the RDF syntax forcing us to 
invent URIs just so we can use RDF's XML syntax. Worse, the URIs that 
get invented / assigned confuse the thing being identified with a 
(part of) an RDF description that happens to mention that thing. Person 'c'
from the silly story about would probably be suprised to discover that 
the Web community were treating http://example.com/whateverthisdociscalled#c
as if it were a well known identifier for him/her.

The RDF Core Working Group created rdf:nodeID as a cleanup for this situation, 
so that now almost all RDF graphs can be round-tripped through RDF parsers/APIs
and back into RDF/XML syntax without that process having to make up silly 
URIs for things in this way.

  <name>person a</name>
      <name>person b</name>
         <Person rdf:nodeID="c">
           <name>person c</name>
   <marriedTo rdf:nodeID="c"/>

...is the new look version. It is symmetrical, since the HTML-derrived linking
metaphor didn't really work in RDF. Neither XML element is really 'linking
to' the other in the sense familiar from hypertext. It is more that they 
are _about_ the same _resource_. But RDF's syntax already uses the 
attribute names 'about' and 'resource', so we made up another somewhat
technical attribute name, 'nodeID', whose purposes is to take 
local-to-this-document identifiers for the things described by our XML 
elements. Unlike the rdf:about and rdf:resource design, we don't distinguish
between the cases where the element stands for a node in the graph (ie. rdf:about) versus stands for an edge in the graph (ie. rdf:resource). This is 
another belated lesson: the original RDF syntax could have been much 
simpler, by just replacing both rdf:about and rdf:resource with a 
single attribute called 'rdf:URI' or somesuch.

But that's water under the bridge. RDF Core fixed the nodeID situation
because it was affecting people's ability to write sensible RDF without
having XML encoding artifacts interfere with what they were saying. We 
could have done a bunch more to beautify the syntax, but at cost of 
greater disruption.

Which brings me verbosely to your maturity question. Yes, rdf:nodeID is 
relatively new, but it is being supported by more and more RDF parsers, is
relatively easy to add to an RDF parser, and is imho quite unlikely to go 
away. RDF Core are now at the stage where we're about to request W3C 
advance the specs to 'proposed recommendation' stage, so now is a good time 
to start making use of the new specs. If you are using and RDF parser which
doesn't support rdf:nodeID, let them know that it's time to upgrade, and 
perhaps ask here or on www-rdf-interest to see if someone might help with
the necessary fixes. 

Julian, you mention that your parser-of-choice doesn't yet support rdf:nodeID. 
(ah, re-reading I realise I was mistaken, you were quoting someone else)
What are you using? the only parser I can imagine being really hard to 
upgrade are Mozilla's (spaghetti code), which is also behind on various 
other things I fear, and the ageing and obsolete SiRPAC parser in Java.

There is a new version of http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/
out today, and that boasts rdf:nodeID support, amongst other things.

Oh, just to confirm:

> <foaf:Document rdf:about="">
>   <foaf:topic rdf:nodeID="n_1"/>
>   <foaf:maker rdf:nodeID="n_1"/>
> </foaf:Document>
> <foaf:Person rdf:nodeID="n_1">

...is correct. It says "the thing we're locally calling n_1 is the topic and themaker of this current document.

Hope this helps...



(a few more comments intersperced below)

* Julian Bond <julian_bond at voidstar.com> [2003-08-02 09:35+0100]
> I'm confused about the differences between ID and nodeID all prompted by
> trying to correctly code up foaf:maker
> All my Persons had an ID and I added foaf:maker. Someone pointed out
> that
> <foaf:Document rdf:about="">
>   <foaf:topic rdf:nodeID="n_1"/>
>   <foaf:maker rdf:nodeID="n_1"/>
> </foaf:Document>
> <foaf:Person rdf:ID="n_1">

Yup, since the rdf:ID expands to a URI which the topic and maker properties
don't reference, so the graph is fragmented.

> ...
> Was incorrect so I changed it to
> <foaf:Document rdf:about="">
>   <foaf:topic rdf:nodeID="n_1"/>
>   <foaf:maker rdf:nodeID="n_1"/>
> </foaf:Document>
> <foaf:Person rdf:nodeID="n_1">
> ...
> Then I got this.
> >Just one thing. Isn't
> >rdf:nodeID from the Working Draft of Jan 2003. Should we be using this

(I believe it was in the Nov 2002 draft too, though I'd have to check.)

 >already. The Working Draft document does carry the usual disclaimer:
> >
> >It is a draft document and may be updated, replaced, or obsoleted by
> >other documents at any time. It is inappropriate to use W3C Working
> >Drafts as reference material or to cite as other than "work in
> progress".

That's true, and in the absense of other information one should be 
careful about baking things into products which are not yet W3C 
recommendations. As a member of the working group, and totally informally,
I can tell you my personal expectation, which is that rdf:nodeID is 
here to stay. I've been wrong before, blah blah blah, but there are now 
an increasing number of parsers (eg. raptor/redland, arp/jena, rap, 
rdflib, rubyrdf, cwm, ...) which can happily deal with it. 

> >
> >>From a personal (and selfish) point of view, rdf:nodeID is not
> >>suppported by the RDF parser I am using. I have the source so I could
> >>hack in support but would feel uneasy about this in case of changes
> >>when the new spec becomes a Recommendation.

Let's try to get this parser updated. Who is using what?

> But changing everything to ID doesn't work either.
>   <foaf:topic rdf:ID="n_1"/>
>  throws
> Error: {W105} Redefinition of ID: n_1[Line = 12, Column = 29]

I've been wondering about working on a cross-parser 'help page' that
expands on the cryptic references parsers spit out. What this means 
is that rdf:ID works like XML ID, in that you can only have any given
rdf:ID value once per document.

> So how about
> <foaf:Document rdf:about="">
>   <foaf:maker rdf:resource="n_1"/>
> </foaf:Document>
> <foaf:Person rdf:ID="n_1">
> It validates but is the meaning correct?

It creates a similar graph, but (per discussion above) creates a URI for 
the person purely for RDF/XML syntax purposes.

> BTW. I am also getting some validation errors when I added
>  <foaf:made rdf:about="" />
> to the main Person.

This again shows the symmetric design of rdf:nodeID is nice. You have 
here the common error of writing rdf:about where you mean rdf:resource
(or vice-versa). The rdf:about attribute only ever appears on an XML 
element that stands for a particular node, ie a particular thing. The 
rdf:resource element only ever appears on an XML element that encodes 
a relationship (such as foaf:made). The former elements often are 
named with capital letters (such as Person, Cat, Car, Company) since
they can carry a type, and RDF follows java-ish conventions, with most
vocabs using lowercase for properties/relations, and upper case for 
classes. So if you see rdf:about on an element that begins with a lowercase
letter, or rdf:resource on an element which begins uppercase, you may
have an example of this error. (note that rss 1.0 is all lowercase so 
breaks this handy naming pattern).

> Error: {E201} Syntax error when processing attribute rdf:about. Cannot
> have attribute rdf:about in this context.[Line = 11, Column = 31]
> Aaarg! It must fairly obvious I don't know what I'm doing or why ;-)

Once you get the basic feel for the so-called 'striped' syntax, and 
hence for which elements are encodings of nodes versus edges, some of 
this stuff becomes a bit more intuitive. But it is a learning curve.
http://www.w3.org/2001/10/stripes/ might help...

(I think I feel about XSLT how you must feel about RDF syntax, fwiw!)

More information about the foaf-dev mailing list