So you want to develop a foafnaut?

Jim Ley jim at j...
Sat Nov 2 11:48:31 UTC 2002


As it's too wet to skate, which was the plan for this morning, I thought
I'd write some notes on foafnaut, and how it's developed.

Things you need:
1. a foaf crawler.
2. a triple store.
3. a smusher.
4. the ability to query the triples, over http.
5. the svg interface.
6. a friendly designer

My crawler isn't available for download yet, but I'll make the source
available some time when I'm back near a machine. It works basically by
getting seeded by , then it gets the rdf to
a local store, then Raptor (shelled to from the javascript.) parses the
RDF to NTriples, and a simple NTriples parser parses them to triples for
putting straight into the mysql triple store which is a very simplistic
table currently, the triples stored simply as text, not encoded to INT's
as in Libby's. My reason for this was simply to make querying easy for me
whilst testing - substring searches being the deciding factor. It does
mean queries aren't too fast though. Anyway I'm sure the basic crawler is
much like the others.

The smusher, is relatively simple, but very, very slow, first it does:

SELECT subj FROM rdf WHERE pred="foaf:mbox" GROUP BY obj

To collect all the subjects, and adds them to list, then it simply
normalises all use of that subj in subjects and objects to be the same
(actually slightly more complicated as it picks a subject that isn't a
genid if one is available - if two are available it discards one which is
a "bad thing". This is pretty simple to write, but takes a long time. I
then do the same but with mbox_sha1sum (which are all first added for each
mbox.) This seems to work, but the smush takes ~20minutes on 20000
triples on a dual PIII 500 server - efficiencies are definately needed,
subsequent smushes after adding triples is faster though.

The querying is just a series of SQL, an old version of the queryier is
available at the new one at does much more
and importantly caches the queries it makes, doing a 302 redirect to them
if one is available, this was needed as a query on a person with lots of
info (such as Libby or Dan) takes 4 seconds or so. The redirect+gzipped
content is delivered in ~0.3 seconds - we could of course optimise further
by not having the redirect.

As you can see from the output, I have the img:naughty predicate, which
isn't quite right, but that was for efficiency in the svg - it may not be
sensible using RDF for the interchange between the server and client here
anyway - the svg side does not have an RDF parser anyway and just uses it
as a simple XML document - this is again for speed - my client-side RDF
parser isn't that fast. Otherwise though the RDF should be valid and
correct, so it would be nice I think to create an HTML version of foafnaut
that lets you explore the data in an alternative way.

The SVG script is of course completely stolen from Dean Jackson, with a
few changes of my own, nothing particularly complicated, and using
parseXML so much rather than the DOM methods may not be liked by the
purists (but I've never understood the absence of parseXML type method in
the DOM interface, XML parsers have a high performance XML parser
available it seems strange to then require a slow scripting language to
parse a document themselves in script.). The script isn't very
understandable and contains many hacks as it's been built in an organic
way, hopefully you can understand what is going on.

The friendly designer is of course "ephidrina" who Dan cunningly
introduced into #foaf knowing I would've otherwise got bored back at if someone with
interface knowledge hadn't've come along.

That'll do for now, hopefully this will do as a write up...



More information about the foaf-dev mailing list