[rdfweb-dev] Time's running out?

Julian Bond julian_bond at voidstar.com
Fri Aug 8 20:15:58 UTC 2003


Danny Ayers <danny666 at virgilio.it> wrote:
>Again, this depends on what you want to do with the data. I would think it
>likely that most FOAF apps with support the key classes and properties such
>as foaf:Person and foaf:knows. Once parsed, the structure described is the
>same whatever the source looks like. I've just had a look at the graph
>(using the W3C validator) of Libby's file at
>http://swordfish.rdfweb.org/people/libby/rdfweb/libby-foaf.rdf
>and this core stuff looks as it should.

Jim Ley <jim at jibbering.com> wrote:
>Could you please explain how it's unusable to you

OK. I'm using PHP with the RDF-API (RAP). I'm using Ian's very useful 
guide http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html as a 
basis. I'm looking specifically for seeAlso, Person and knows with 
sub-tasks of grabbing or creating an mbox_sha1sum where it's available 
for any Person found. Your basic scutter with the intent of building a 
database of people and the links between them.

The basic approach is
1 define a resource for rdfs:seeAlso. Search for all instances
2 define a resource for foaf:Person. Search for all instances
   3 For each Person search for foaf:mbox or foaf:mbox_sha1sum
   4 For each Person search for foaf:knows
     5 For each knows.Person search for foaf:mbox and foaf:mbox_sha1sum

1. Works fine. 2,3,4,5 all find nothing. Because there's not a single 
foaf:Person in the file. And yet the data *is* actually in the file. In 
order to find it I've got to follow the indirection from 
http://www.w3.org/1999/02/22-rdf-syntax-ns# to FOAF. Something that I 
don't think is possible with the available machine-readable namespace 
definitions.

So here's a problem that RAP doesn't actually do anything with the 
namespaces except to de-reference the subscripts. Which means that I as 
a developer have to have knowledge of the namespaces I'm interested in 
and all possible versions of them likely to be of interest. Where there 
are two or more versions with data in the wild, I'll have to iterate 
over them. And even though I'm working with triples there's just enough 
strangeness about the way people construct them that the problem above 
is not an isolated instance.

Now. Take a look at this http://www.semanticplanet.com/sources/16/
"Things Described. Nothing of interest." It seems that Ian is suffering 
from the same problems. In fact it seems to me that this is endemic in 
the process and not particularly any fault of RAP. I could do regex 
searches on the triples but then I'd get caught when a feed used 
namespaces to differentiate between two overlapping property names.

Now the exercise above is just about the simplest thing I could do. I 
haven't even begun to deal with additional metadata about the people or 
alternatives to foaf:knows that add properties for the links.

Finally, now I've got all those seeAlsos stored, I'll go off and fetch 
them. There's a significant number of 404s, html web pages, RSS files, 
invalid XML, V large files, and so on. RDF-API being memory based and 
largely interpreted gets exceedingly slow on large files. So I've got a 
significant amount of error checking to do and some files I just have to 
discard to avoid everything grinding to a halt.

My previous post was deliberately provocative but wasn't meant to cause 
offence. I hope nobody took it personally. But I still have the 
frustration. I'm using an RDF parser. It's the only more or less 
complete parser for my platform. I'm having to work round it's 
limitations. I'm working with triples not XML. And the first and 
simplest task I try, throws up issues that look like fundamental flaws. 
Or at least enough inconsistency to make the task unnecessarily hard.

Now it turns out that there is a large and growing body of FOAF data 
that is machine produced (as opposed to a text editor) and is consistent 
with the FOAF spec (being careful not to say "valid"). The scutter 
described above copes well with most of it. So I'll just discard Libby's 
file as empty and move on to second order problems. Like trying to 
relate a person and some of their data with the file that they actually 
produced. There's probably only 3 ways of doing this so that problem 
shouldn't take too much code...

-- 
Julian Bond Email&MSM: julian.bond at voidstar.com
Webmaster:              http://www.ecademy.com/
Personal WebLog:       http://www.voidstar.com/
M: +44 (0)77 5907 2173   T: +44 (0)192 0412 433



More information about the foaf-dev mailing list