[rdfweb-dev] tangle of issues (URIs and global IDs, information retrieval, rdfs:seeAlso, WOT...)

David Madore david.madore at ens.fr
Sun Aug 17 22:21:13 UTC 2003


Let me try to examine the issues brought forth in the last couple of
posts (from the "advocating use of..." thread).  Many thanks to Dan
Brickley for his enlightening reply.

This post is rather long, but I hope that the formating will make it
easy to read only partially, if people are interested only in some
issues and not in others.

Please let me know of any comments and disagreements on what follows.


*** Concerning URI uniqueness

Let's make sure that the following terminology is agreed upon: a
property <p> is functional (<p> rdf:type owl:FunctionalProperty) when
a given resource can have only one value of that property (that is,
<a> <p> <o1> and <a> <p> <o2> imply <o1> owl:sameAs <o2>), and it is
inverse-functional (<p> rdf:type owl:InverseFunctionalProperty) when
two given resources having the same value of the property are actually
equal.

For example, the "date of birth" property of an individual is
functional: a given person can have only one birth date; it is not
inverse-functional, because there exist many people with the same date
of birth.  The "personal mailbox" property, on the other hand, is
inverse-functional - only one person reads a personal mailbox,
otherwise it is not personal - but not functional because a person
might have several mailboxes, or might change mailbox at some point.

Now what about URIs used to refer to resources?  They are not
properties in the RDF sense (somehow RDF seems to have left out the
obvious property which takes a literal value that represents a URI,
and equals its subject with the resource identified by that URI), but
no matter.  Certainly URIs are inverse-functional: two resources
identified by the same URI are, as far as RDF is concerned, the exact
same resource.  (In real life this may be questionable in the presence
of, say, HTTP content negotiation, but this is not about real life.)
The question is now: are URIs functional?

In other words: it is sure that two different resources cannot have
the same URI, but can the same resource have two different URIs?  Now
Dan Brickley tells us - if I understand correctly - that the W3C has
not resolved this question ("how many angels can stand on a pinhead?")
- and certainly it has been very discreet on the question - and that
FOAF prefers to avoid altogether the use of URI to refer to people for
that very reason.

I hope this summary is accurate.  Now I'll raise the following points
on this matter (the following are my opinion, and I do not claim that
they reflect anyone else's):

  * I read the present specifications and working drafts, as they stand,
  to imply that URIs are _not_ a functional property: the same RDF
  resource _can_ have several URIs that refer to it.  Nowhere is the
  functiona property constraint demanded in the RDF Semantics
  specification, <URL: http://www.w3.org/TR/rdf-mt/ >, which would be
  stated mathematically by demanding that the interpretation mapping IS
  from the vocabulary V to the set IR of resources be one-to-one.  Also
  note that the owl:sameAs property would be rather pointless if a
  resource could not have more than one URI referring to it.

  * There are deep reasons, at once theoretical, pracical and
  philosophical, why it would be reasonable to demand referring URIs to
  be a functional of their argument.  Namely:

    + Theoretically, this is because many interesting domains simply
    cannot be recursively enumerated using distinct names because equality
    in them might be undecidable.  So unless we are willing to altogether
    exclude such domains from consideration, we will have to admit the
    possibility for several different names to refer to the same rose.  I
    also doubt, as a mathematician and part-time logicial, that anyone can
    come up with reasonable semantics for RDF that would bar the
    possibility of different URIs referring to the same resource (remember
    that semantics deal with *interpretation* of RDF, and the
    URI-to-resource map is part of that interpretation).

    + The practical reasons we find ourselves currently faced with: it is
    hardly possible to keep URIs forever perennial (even for URNs), and
    hardly reasonable to forbid any redirection mechanism.

    + Lastly, philosophically, there is something very wrong (cabalistic,
    shall we say?) about demanding things to have only one True Name
    (URI).  This stirs some very problematic questions about what equality
    means (for example in connection with modal logic: Tolkien may be the
    author of *The Lord of the Rings*, but it is not the same thing to
    wish to be Tolkien as to wish to be the author of *The Lord of the
    Rings*; and George Bush may be the president of the United States, but
    from the fact that Lee Oswald assassinated the president of the United
    States it does not follow that Lee Oswald assassinated George Bush).
    With the Leibnizian definition of equality (viz., two things are equal
    if and only if they have the same properties), we would be demanding
    that two resources identified by different URIs always differ by some
    propety, which is a mess.

  -> For all these reasons, I think the W3C will have no other choice
  but to acknowledge clearly the fact that it is possible (though
  perhaps not desirable!) for a pin head (resource) to hold several
  angels (URIs referring to it).

  * And anyway, the FOAF field experiment clearly points in this
  direction: if Jim Ley is at once <URL:
  http://xml.mfd-consult.dk/foaf/morten.rdf#jim > and <URL:
  http://jibbering.com/foaf.rdf#xpointer(whatever) >, no matter how hard
  we try to avoid the fact, then isn't it better to recognize the
  possibility and use/introduce means to deal with it (such as using the
  owl:sameAs property) than to do away with URIs altogether?  If we are
  willing to deal with the fact that mbox_sha1sum might be simply an
  inverse-functional, and not functional, property, why not accept the
  same thing of URI references?

  -> Morality: if FOAF's use of RDF consists of giving up URI references
  to refer to people, because they cause too many problems, it is the
  sign of an immense failure of RDF, because that is what RDF (and the
  Web in general) is all about, isn't it?

** Consequences in FOAF

I trust I have made my position amply clear by now, and everyone can
make up their own mind.  My recommendations would be the following:

  - When writing a FOAF file, always introduce an rdf:ID attribute on
  every person which this file describes "authoritatively".  Make sure
  the file itself has a fixed base URI (at which it can be fetched), or
  use xml:base to specify such a fix URI.  Do not introduce rdf:ID
  attributes on people for which the RDF file is not the authoritative
  description: rather, find the authoritative description (if it
  exists), and if it has an rdf:ID attribute, then in your own
  description use rdf:about to the base URI anchored by that ID; or if
  it has an rdf:about itself, just copy that rdf:about attribute.

  - If you discover some other file that refers to you with an rdf:ID
  attribute, use the owl:sameAs and daml:equivalentTo properties to
  identify resources.  Possibly demand of the file's authors that they
  replace rdf:ID by the rdf:about URI that refers to you.

After that, people are free to accept or refuse to be referred to by a
URI.


*** Concerning the Web of Trust

Just bringing the subject up to mention that it seems a lot of a mess.
Certainly we shouldn't trust all of Joe's statements about Jane: Joe's
foaf.rdf file might say that Jane is an idiot, but we shouldn't
conclude that Jane is an idiot, only that Joe thinks that Jane is an
idiot.  It isn't clear where the trust should come from, either,
because any properties such as "downloaded from Joe's Web site" or
"PGP signed with Joe's secret key" only pertain to the matter insofar
as we can trust the meaning of "Joe's Web site" or "Joe's secret key"
in the first place, so there is a well-known chicken-and-egg problem
on bootstrapping trust (see e.g. Ken Thompson's famous Turing award
speech, *Reflections on Trusting Trust*), which must be solved by
putting trust somewhere to start with.  But even if Joe's FOAF file is
ascertained as his, so maybe we should trust what he says about
himself, it does not follow that we should also trust what he says
about Jane.

The interaction between these "Web of Trust" issues and the
inverse-functional properties mentioned above is rather nasty.  If Joe
Smith's FOAF file contains

<rdf:Description rdf:about="http://www.doefamily.tld/~jane/foaf.rdf#janedoe">
  <rdf:type rdf:resource="http://www.iqratings.tld/RSS/#Idiot" />
</rdf:Description>

then Jane Doe (who is not an iq:Idiot) can feel offended because Joe
Smith is saying she is an idiot - and possibly she can sue him for
slander, or whatever.  We can clearly say that Joe Smith has been
lying, or at least mistaken, about his judgment concerning Jane Doe.
Simple.  But now assume that Joe Smith writes instead:

<foaf:Person>
  <foaf:name>Jane Doe</foaf:name>
  <foaf:mbox_sha1sum>88fd6b198ff3aa9de7186c7f18ae36a8d831109e</foaf:mbox_sha1sum>
  <foaf:homepage rdf:resource="http://www.doefamily.tld/~jane/" />
  <rdf:type rdf:resource="http://www.iqratings.tld/RSS/#Idiot" />
</foaf:Person>

and assume again that Jane Doe is not an iq:Idiot.  But is Joe Smith
saying that _she_ is?  No, he is merely stating that someone called
Jane Doe, whose mbox_sha1sum is
88fd6b198ff3aa9de7186c7f18ae36a8d831109e and whose homepage is
http://www.doefamily.tld/~jane/ is an idiot.  Of course, since these
are inverse-functional properties, you might say: "but that means that
Jane Doe is an idiot!" - but it only does provided you accept Joe
Smith's word as the truth, and if you don't, you can't conclude that
the foaf:Person he's talking about is indeed Jane Doe.  Joe Smith is
asserting the existence of resource about which he makes five
statements, concerning the values of the properties rdf:type (twice),
foaf:name, foaf:mbox_sha1sum and foaf:homepage.  Now if we know that
Jane smith is not an iq:Idiot, we know that one or more of these five
statements must be wrong, but we cannot tell which (maybe John Smith
was misled by someone impersonating Jane Doe, who gave him Jane's real
email address and homepage address).

I'm not sure whether this remark counts as an argument in favor of the
use of URIs to refer to people (it might even count against it).  I'm
not claiming this.  But I wish to raise the issue to point out that
things are subtle and delicate: already the RDF semantics are not
obvious, but when combined in a web of trust, which means heavily
modal logic, they become yet far more complicated.


*** Concerning the use of rdfs:seeAlso

I'll certainly take the RDFSchema Editor's word on what rdfs:seeAlso
was meant for (I mean the contents of <URL:
http://esw.w3.org/topic/UsingSeeAlso >).  But, with due respect, I
then think that the specification is extremely unclear.  I would also
like to know why it has not been deemed opportune to introduce two
different properties, one that would just vaguely state that the
subject resource has some relation with the object resource, and one
that would actually imply that the object resource is an RDF file that
is downloadable at least in certain circumstances and machine-parsable
and possibly contains information pertaining to the subject resource.
Possibly even a third relation, meaning that the information found in
the target file should be trusted (just as much as the relation itself
is trusted, that is) would be useful.  The difference between these
might not always be important, but the only reasonalbe policy seems to
be, "when in doubt, use separate properties", and certainly we are in
doubt.

Apparently I am not the only person confused: the authors of the
Dublin Core's RDF schemata have likewised been led to use rdfs:seeAlso
in a way that doesn't seem to match the restricted sense.  For
example, see <URL: http://purl.org/dc/terms/ >: nearly every
rdfs:seeAlso property used in that document points to a human-readable
html (or plain text) file.

I am told that the proposed solution to expressedly state that the
rdfs:seeAlso value is an RDF resource is to use something like

<rdfs:seeAlso>
  <rdf:Description rdf:about="http://foo.tld/bar.rdf">
    <dc:format>application/rdf+xml</dc:format>
  </rdf:Description>
</rdfs:seeAlso>

But I have three problems with this.  The first is that this is not
machine-parsable, because the Dublin Core does not explicitely state
that dc:format is a MIME type, it only records it as a "best
practice", which doesn't make it very useful.  Maybe the property
http://www.w3.org/1999/xx/http#ContentType would be more appropriate,
but I'm not sure it means what I think it might mean (for example,
what if the RDF document is obtained by FTP rather than HTTP?).
Second problem: I might not know whether the document has MIME type
application/rdf+xml (which, AFAIK, is still not registered with IANA)
or text/xml or something of the sort; it is not obvious that all hell
will not break loose when the types differ even slightly.  Third
problem: if a well-meaning person downloads http://foo.tld/bar.rdf and
notices that the interesting part is #corge say, and adds "#corge" at
the end of the rdf:about attribute, then suddenly we are saying that
this #corge (perhaps a foaf:Person) has dc:format
"application/rdf+xml", which is probably wrong.  Whereas there was
nothing wrong with saying <rdfs:seeAlso
rdf:resource="http://foo.tld/bar.rdf#corge">, it would seem.  Anyway,
it is not simply a matter of MIME type, but also, for example, of
whether the link should be followed automatically or only on demand.

Surely any kind of doubt should profit the prudent interpretation.
And the prudent interpretation would be to introduce a foaf:foafFile
property (with foaf:foafFile rdfs:subPropertyOf rdfs:seeAlso) on a
foaf:Person, supposedly pointing to an RDF file that describes the
person in question, and recommend using that property instead of
rdfs:seeAlso.  It might not be *necessary*, but it is hardly costly,
and it seems the best course if only to put spirits at rest.  Is there
any reason *not* to do so?  The transitional course would be:
(1)introduce the foaf:foafFile property in the spec, (2)start patching
FOAF parsers to make use of it, (3)recommend using it alongside
rdfs:seeAlso, and (4)when foaf:foafFile becomes widely enough used,
remove automatic following of rdfs:seeAlso in FOAF parsers.  Is there
any reason *not* to do this?  Anything that would be *wrong* with
introducing a new property?  Otherwise, as I have said, any shadow of
a doubt should benefit the safer course (and the Dublin Core's example
I just gave certainly shows that there is more than a shadow of a
doubt).

In any case, some kind of foaf:foafFileTrusted property will probably
be needed in the future.


Cheers,


-- 
     David A. Madore
<URL: http://www.eleves.ens.fr:8080/home/madore/meta.rdf#dmadore >



More information about the foaf-dev mailing list