[foaf-dev] mbox_sha1sum describes only email

Steve Harris steve.harris at garlik.com
Tue Sep 16 16:15:31 CEST 2008

On 16 Sep 2008, at 13:30, Dan Brickley wrote:
>>> basically this gives you a chunk of data that you can 'ask  
>>> questions  of' without enumerating all of its contents. So one  
>>> might convert an  addressbook into a bloom filter, and be able to  
>>> ask it whether  certain values are 'in there', without easily  
>>> listing its contents.  It could of course be systematically  
>>> probed, and there are probably  other attacks. An early  
>>> exploration of this idea was LOAF, http://loaf.cantbedone.org/about.htm
>>> http://www.perl.com/pub/a/2004/04/08/bloom_filters.html?page=1 etc
>> Neat trick, but that still has the effect of making it hard to  
>> match a  FOAF file that you find in the wild to another.
> Yup. One scenario here is to address the 'Scoble problem' with  
> Facebook etc exports. A facebook exporter could generate a 'contacts  
> list' bloom filter for a given user. It could have values based on  
> email address or other identifying properties, plus also some  
> watermarking secrets hidden inside. The bloom might have associated  
> privacy expectations, such as 'please don't export this into the  
> public Web'.

I'm not sure how it really addresses that problem (I'm being thick I  
suspect), but I can see it addresses other ones.

> For example, I might run an exporter on my contacts list, which  
> probably includes you. The exporter could iterate through each  
> contact, and check their privacy preferences (through another  
> service; handwaving here for now). If the person wants their email,  
> homepage, openid etc public, the data could be public. If the person  
> is very private, the exporter skips them completely. And if they  
> choose for medium flavour privacy, perhaps a hash of their various  
> emails, plus openid, homepages, other accounts, goes into a single  
> bloom filter. The service also writes into the filter something  
> equivalent to "this was exported for user=danbri.org'. A bit more  
> handwaving here as I'm not sure how to make sure the watermark is  
> non-removable.

You can't remove things from 1ary bloom filters without the risk of  
destroying data you want. However the filter will have to be fairly  
large to have a reasonable chance of being a reliable source of  
confirmation, which increases the chance that you can remove things  
without it being obvious.

Also, in a large scale FOAF repository you will get a large number of  
false positives from checksums for known individuals that happen to  
hash the same way in your filter. This would work on a bounded network  
of say a few hundred individuals though.

>> I guess for that purpose we just have to rely on foaf:homepage et  
>> al,  which most people don't consider to be confidential.
> It is tempting to say that a homepage is by definition non- 
> confidential. Per http://xmlns.com/foaf/0.1/#term_homepage
> [[
> A 'homepage' in this sense is a public Web document, typically but  
> not necessarily available in HTML format. The page has as a  
> foaf:topic the thing whose homepage it is. The homepage is usually  
> controlled, edited or published by the thing whose homepage it is;  
> as such one might look to a homepage for information on its owner  
> from its owner.
> ]]

I read it that way, yes.

> However we should also consider possibility that the homepage is  
> publically only associated with one of the owner/creator's  
> potentially many personas/roles, and might not itself contain enough  
> identifying information to link them up to other info.

Sure, but that would be intentional, and desirable I think.

> With OpenID, there are other considerations. It is not really  
> legitimate to assume that someone's openid is fair game for making  
> public, without explicitly saying so during the login process. I'm  
> guilty of this on the FOAF wiki, as are many other sites. So nice  
> openid consumers allow a site-specific alias for the user, eg. I  
> could log in with 'danbri.org' but show up within the site as  
> user=bandri.

Yeah, that's a tricky one though. It's a reasonable expectation to be  
able to ID the person from their foaf:openid value, and I'm not sure  
of the ramifications of doing that against a hash - finding a  
colliding hash is also subject to rainbow table attacks. I think I  
could find a valid OpenID URL I control, that satisfies some known  
bloom filter with only a few hours effort.

> There's only so far we can get here without having materials in the  
> 'terms of service' and privacy docs of the large FOAF-exporting  
> sites. And explaining to users what exactly these things mean is  
> going to be a big, tricky, task...

Yes, it is.

- Steve

More information about the foaf-dev mailing list