[foaf-dev] beyond foaf:mbox_sha1sum

Gregory Williams greg at evilfunhouse.com
Sat Dec 19 19:05:40 CET 2009

On Dec 19, 2009, at 12:50 PM, Richard Cyganiak wrote:

> On 19 Dec 2009, at 14:42, Dan Brickley wrote:
>> Time to gently retire it?
>> http://ebiquity.umbc.edu/blogger/2009/12/17/foafmbox_sha1sum-considered-harmful/
>> etc
>> Thoughts on ways forward?
>> 1. mark foaf:mbox_sha1sum as archaic
> No. It's in wide use, and it has valid uses, althout it's perhaps  
> overused ATM.
>> 2. rewrite http://xmlns.com/foaf/spec/#term_mbox_sha1sum to more
>> clearly emphasise the risks, and that decision to publish shouldn't be
>> made for others
> It's certainly good to emphasize the risks. There could be something  
> like: “A service that promises to keep users' email addresses private,  
> should not publish the sha1-obfuscated form either.”
> I still think that the property is useful for translating mailing list  
> archives to RDF, for example. The text should not be so alarmist that  
> it discourages such uses.

I agree with Richard here. It's in wide use and it has legitimate uses, so I wouldn't like to see it marked as archaic.

>> 3. perhaps remove the owl:InverseFunctionalProperty typing (this will
>> help with OWL DL compatibility too)
> +1. In practice, doing IFP smushing on this property according to the  
> OWL spec is a recipe for disaster anyway [1].

I'm not convinced about this "recipe for disaster" stuff. The pedantic web page you link to suggests to me that people should just be careful when using this term, not that they shouldn't use it. The fact that some sites don't properly protect against exporting the hash of an empty string (where an email address should have been) doesn't strike me as a reason that the sites that do use it properly shouldn't benefit from its current use.

As for the issue of it not being a true IFP since SHA1 can collide, I'm not terribly convinced by this either (modulo the issue of sites improperly exporting email fields as hashes). Has anyone ever bothered to analyze the potential for collision on mailto: IRIs? It's got to be much smaller than the general collision probability since email addresses have syntax restrictions, and I wouldn't think this is a case where we're worried about the ease of generating collisions (since I could simply bypass the generation stage and just assert bad data by claiming to have the same sha1_sum as somebody else).

>> 4. encourage data publishers to assign URIs to account holders
>> directly, to indicate openID URIs and other identifying properties as
>> users permit


> Tangent: I find mbox_sha1sum useful for adding former email addresses  
> that I no longer use to my FOAF file. The hashes can still be used for  
> smushing, but no one would mistake the old email addresses as being  
> current. That's something I could not do with foaf:mbox alone. Is  
> there a case for a new property or some sort of new idiom here?

Yeah, this is how I use it as well.


More information about the foaf-dev mailing list