[foaf-dev] beyond foaf:mbox_sha1sum

Norman Gray norman at astro.gla.ac.uk
Tue Dec 22 15:26:34 CET 2009


Steve and Mischa, hello.

On 2009 Dec 22, at 13:13, Steve Harris wrote:

> On 22 Dec 2009, at 12:55, Norman Gray wrote:
>> Warning. The use of domain URI without a trailing slash, the convention as per http examples states that : http://www.google.com is not sociable and should have a trailing slash
>> 
>> I've never heard of this convention, or of the idea that a URI is or is not 'sociable' because of the presence or absence of the redundant and not-required trailing slash.  Can you elaborate?  The link in the warning message is to <http://www.w3.org/Addressing/URL/4_Ex_HTTP.html>, which only lists a number of example URIs (that's a _very_ old page, by the way).

> Without some convention in this area we make FOAF processors jobs a lot harder.

I don't think there's any need for a convention.  The URI http://foo is necessarily identically equivalent in function to the URI http://foo/, by virtue of the HTTP spec.  Thus although they don't compare equal as strings, they are explicitly noted as equivalent in section 6.2.3 of RFC 3986 (which includes equivalent in the owl:sameAs sense, though I doubt it would be either necessary or useful to state this explicitly).

> The alternative would be to resolve "http:/google.com" and see what it forwards to, and using what 30x directives, but that opens another can of worms.
> 
> N.B.
> 
> GET  HTTP/1.0
> on google.com:80 returns a 404 error, so at least in that case it's an incorrect URI. It's probably an invalid HTTP request also, but I can't be bothered to look to be honest :)

It is indeed an invalid HTTP request (which says nothing about it being an incorrect URI), because the Request-URI component is missing.

Section 5.1.2 of RFC 2616 says:

> Note that the absolute path
>    cannot be empty; if none is present in the original URI, it MUST be
>    given as "/" (the server root).

That is, if you type http://foo or http://foo/ into a browser or any other HTTP client, the HTTP request will be identical (this isn't precisely true at the bytestream level, since the HTTP client is free to put the whole URI here rather than just the path part, and curl at least does that, but the point still remains).

> GET / HTTP/1.0
> from the US returns a 200, which would make http://google.com/ a valid google URI
> from the UK I get a 302 pointing to http://www.google.co.uk/

But that's true of http://www.google.com/ with or without the slash.

That is, "to resolve http://google.com" is precisely the same as "to resolve http://google.com/", so whatever the former replies, the latter would reply also.

> So, nowhere do I see a justification for identifying Google's homepage with "http://www.google.com".


The justification is that the URIs are equivalent, with or without the slash, even to the extent of this point being explicitly noted in both the URI spec and the HTTP spec.  I can see that this isn't particularly convenient for RDF client software, which would prefer string equivalence to be the only equivalence for URIs, but it is nonetheless the case.

[I dimly remember the issue of URI normalisation in RDF coming up in the past, but can't remember the resolution, beyond "I wish we didn't have this problem"]

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk





More information about the foaf-dev mailing list