[foaf-dev] mbox_sha1sum describes only email

Dan Brickley danbri at danbri.org
Tue Sep 16 12:26:55 CEST 2008

Steve Harris wrote:
> On 14 Sep 2008, at 13:55, Story Henry wrote:
>> That is exactly what the proposed xxx:uriChecksum relation I am
>> proposing would be.
>> [] a foaf:Person
>>    foaf:mbox [ xxx:uriHash "hjsdfsjhfskjhdfskj" ] .
>> this is saying there is a person who has a foaf mbox, that is unknown,
>> and that the uri of the mbox has the given hash sum
>> "hjsdfsjhfskjhdfskj" . To make this clearer, here is the same
>> expressed with bnodes:
>> _:p1 a foaf:Person;
>>    foaf:mbox _:b .
>> _:b xxx:uriHash "hjsdfsjhfskjhdfskj" .
>> This seems simpler than what you are proposing above.
> Plus, I think many of us have learnt to stay away from syntactic  
> reification after RDF Reification pain.
> Given the recent work on reversing sha1 sums of email addresses using  
> rainbow table I'm hesitant about using that technique anymore to mask  
> important information. I've been putting some thought into how to  
> defend checksums against attacks like that, but so far I've come up  
> with nothing practical.

I tend to agree. The history of foaf:sha1_sum is as a replacement for 
foaf:mbox, rather than as a substitute for true privacy. I think of it a 
bit like 'net curtains' (hmm are these cultural universals? here's a 
wikipedia link in case not - 
http://en.wikipedia.org/wiki/Curtain#Light_control_and_insulation ).

When we started this project, it was initially deployed amongst the 
kinds of people who had homepages and who shared their full email 
address in public. So the mbox_sha1sum construct was a little bit of 
hiding around that. What we see today, are thousands and more hashed 
mailbox IDs being published on behalf of users who don't really 
understand even the basics of what SHA1 is, nor the associated risks.

While I think the idea of including an optional salt in 
foaf:mbox_sha1sum is worth exploring, I reckon that in general we should 
probably be exploring data hiding / control techniques that are "wrapped 
around FOAF " (and all other vocabs) rather than somehow put inside the 
RDF data.

Another trick here is to explore Bloom Filters. Henry's written a bit on 
this, and I've made a few crude experiments, 

basically this gives you a chunk of data that you can 'ask questions of' 
without enumerating all of its contents. So one might convert an 
addressbook into a bloom filter, and be able to ask it whether certain 
values are 'in there', without easily listing its contents. It could of 
course be systematically probed, and there are probably other attacks. 
An early exploration of this idea was LOAF, 
http://www.perl.com/pub/a/2004/04/08/bloom_filters.html?page=1 etc

> Ideally there would be a salt in with the checksum, but if you do that  
> there's no easy way to search for the checksums by value.





More information about the foaf-dev mailing list