[foaf-protocols] P2P FOAF search

Luke Maurits luke at maurits.id.au
Thu Jun 11 12:34:22 CEST 2009


Okay!  Time to make some of the ideas that have been floating around a
bit more concrete!

Here's a sort of "meta-protocol", a very high level description of the
overall system which could be implemented in countless ways.  We can
work our way up from small scale implementations up to betters ones,
some of which could eventually become the actual standard protocols:

----------
P2P FOAF search Meta Protocol 0.1

The NETWORK is composed of a number of NODES.

Each NODE is a software agent which has:

1) A STORE of RDF triples relevant to a set of foaf:People
2) A LIST of other NODES which it has the ability to send MESSAGES to.

Each MESSAGE has:

1) A TYPE
2) A finite number of PARAMETERS (key/value pairs)
3) A BODY

(think of these as being analagous to the method, headers and entity of
a HTTP request)

We define two TYPES of MESSAGE:

1) SEARCH messages which contain:

  i) An "origin" PARAMETER whose value is some set of information
sufficient to enable sending a MESSAGE to a NODE ii) A "ttl" PARAMETER
whose value is a non-negative integer iii) A "searchid" PARAMETER whose
value is a 128 bit string. iv) A BODY which is some representation of
(perhaps but not necessarily a literal string) a SPARQL query

2) RESPONSE messages which contain:

  i) A "searchid" PARAMETER whose value is a 128 bit string.
  ii) A BODY which is some representation of a set of RDF triples
relevant to a foaf:Person

The behaviour of the NODES is as follows:

SEARCHING:

To search for information about a foaf:Person, the NODE constructs a
SEARCH MESSAGE with:

1) An "origin" PARAMETER specifying its own contact details
2) A "ttl" PARAMETER of some fixed positive integer
3) A random "searchid" PARAMETER
3) A BODY which is a representation of a SPARQL query for retrieving
the desired information

and sends this MESSAGE to some subset of the NODES in its LIST.

The NODE then listens for RESPONSE MESSAGE from other NODES which have
a "searchid" PARAMETER matching that of the SEARCH MESSAGE it sent.
For each such RESPONSE MESSAGE, the information contained in the BODY
is added to the set of results for this search.  The listening ends
after a fixed period of time has passed since the receipt of the last
RESPONSE (or since the sending of the SEARCH if no RESPONSES are
received).

RESPONDING:

Upon receipt of a SEARCH MESSAGE from another NODE, a NODE may
optionally (at its own discretion) convert the BODY of the SEARCH to a
SPARQL query and apply that query to its STORE.  If the result set is
non-empty, then the NODE may optionally (at its own discretion)
construct a RESPONSE for each foaf:Person in the result set with:

1) A "searchid" PARAMETER whose value is the same value as the
"searchid" PARAMETER of the SEARCH MESSAGE.
2) A BODY which is some representation of the the RDF triples
pertaining to that foaf:Person

and send that RESPONSE to the NODE specified by the SEARCH MESSAGE's
"origin" PARAMETER.

If the "ttl" PARAMETER of the SEARCH is non-zero, the NODE may then
optionally (at its own discretion) send to some subset of the NODES in
its LIST a SEARCH message which is identical to the one received with
the exception that the value of the "ttl" PARAMETER is decremented by 1.
----------

Things to note:

 * The details of message transport are completely open, meaning the
protocol could be implemented in many different ways (I'll sketch two
below later).  Some client software could support multiple ways,
effectively bridging what are technically distinct NETWORKS
 * Responses to SEARCH messages are optional at every stage, so client
software could filter them based on:
     1) The value of the "origin" PARAMETER
     2) Any property of the SPARQL query
     3) Any property of the result set it computers for the query
 * A NODE's STORE is not required to be static - it may be changed
after receipt of a SEARCH and before the execution of a query or
sending of RESPONSES, to enable the sort of logic shown in Toby
Inkster's email.

To sketch a concrete but lightweight implementation of this
meta-protocol:

----------
Handling transport by XMPP:

* Each NODE could be associated with a XMPP account.
* Each NODE's LIST could be the friend list associated with its XMPP
account.
* MESSAGES could be inserted in XMPP messages.
* The "origin" PARAMETER values could be XMPP account ids (user at domain)

Super-simple search representation:

* The representation of a SPARQL query could be a comma separated list
of "search strings" for foaf:Name, foaf:mbox and foaf:mbox_sha1sum,
where search strings support wildcard characters.  So: "Steve*,*,*" is
converted to a SPARQL query requesting that foaf:Name beings with Steve
and nothing else.

Super-simple result representation:

* A set of RDF triples about a foaf:Person could be represented by a
URI which dereferences to a foaf:PersonalProfileDocument
----------

This feels pretty achievable.  XMPP does all the hard transport work,
and libraries are available in just about any programming language.
The querying abilities are very limited but a complete cinch to
implement.  Proof of concept would only require 3 running NODES with
specially crafted STORES and LISTS such that:
  1) Node A wants information about foaf:Person X
  2) Node C knows information about foaf:Person X
  3) Node B is on Node A's LIST but Node C is not
  4) Node C is on Node B's LIST
Two or three people should be able to do this demonstration.  If there
is interest I may take a stab at a Python implementation of this
implmenetation of the meta-protocol in the near future.  Other people
are of course welcome to try implementations in other languages - XMPP
and the super-simple search representation should make it very easy to
get them working together.

Downsides:

1) You need an XMPP account just to search for other FOAFers, which is
far too restrictive to be a general search solution.
2) You need to know somebody else on the network just to search for
other FOAFers, which is again too restrictive.
3) Searches are not very anonymous (many NODES will know (i) what you
searched for and (ii) Your XMPP account details).  It's easy to
envisage applications of FOAF where search anonymity is desirable.
4) Others?

To sketch a heavier implementation of the protocol which wouldn't have
downsides 1-3 above but would need (1) an extension of the protocol and
(2) a lot more coding work to implement in software:

----------
Handling transport by TCP/IP:

* It should be easy to serialise MESSAGES in a format *very* similar to
HTTP, which can just be sent around as per usual over TCP connections.
* The "origin" PARAMETER values could also be (IP address, port number)
pairs
* Each NODE's LIST could be a list of (IP address, port number) pairs

Each NODE's LIST would have to be populated in some way.  This could be
done by defining extra MESSAGE TYPES for discovery/exploration.  I know
there *are* ways to do this, but I'm not very clear on them at the
moment.  More reading required!

Search representation:

* Full blown SPARQL?
----------

This seems like the right direction for a "proper" implementation but
it's too much work to aim for right off the bat.

Alright!  Sorry for the high level of detail in this email, but what do
people think?


Luke


More information about the foaf-protocols mailing list