[rdfweb-dev] Time's running out?
Julian Bond
julian_bond at voidstar.com
Fri Aug 8 20:15:58 UTC 2003
Danny Ayers <danny666 at virgilio.it> wrote:
>Again, this depends on what you want to do with the data. I would think it
>likely that most FOAF apps with support the key classes and properties such
>as foaf:Person and foaf:knows. Once parsed, the structure described is the
>same whatever the source looks like. I've just had a look at the graph
>(using the W3C validator) of Libby's file at
>http://swordfish.rdfweb.org/people/libby/rdfweb/libby-foaf.rdf
>and this core stuff looks as it should.
Jim Ley <jim at jibbering.com> wrote:
>Could you please explain how it's unusable to you
OK. I'm using PHP with the RDF-API (RAP). I'm using Ian's very useful
guide http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html as a
basis. I'm looking specifically for seeAlso, Person and knows with
sub-tasks of grabbing or creating an mbox_sha1sum where it's available
for any Person found. Your basic scutter with the intent of building a
database of people and the links between them.
The basic approach is
1 define a resource for rdfs:seeAlso. Search for all instances
2 define a resource for foaf:Person. Search for all instances
3 For each Person search for foaf:mbox or foaf:mbox_sha1sum
4 For each Person search for foaf:knows
5 For each knows.Person search for foaf:mbox and foaf:mbox_sha1sum
1. Works fine. 2,3,4,5 all find nothing. Because there's not a single
foaf:Person in the file. And yet the data *is* actually in the file. In
order to find it I've got to follow the indirection from
http://www.w3.org/1999/02/22-rdf-syntax-ns# to FOAF. Something that I
don't think is possible with the available machine-readable namespace
definitions.
So here's a problem that RAP doesn't actually do anything with the
namespaces except to de-reference the subscripts. Which means that I as
a developer have to have knowledge of the namespaces I'm interested in
and all possible versions of them likely to be of interest. Where there
are two or more versions with data in the wild, I'll have to iterate
over them. And even though I'm working with triples there's just enough
strangeness about the way people construct them that the problem above
is not an isolated instance.
Now. Take a look at this http://www.semanticplanet.com/sources/16/
"Things Described. Nothing of interest." It seems that Ian is suffering
from the same problems. In fact it seems to me that this is endemic in
the process and not particularly any fault of RAP. I could do regex
searches on the triples but then I'd get caught when a feed used
namespaces to differentiate between two overlapping property names.
Now the exercise above is just about the simplest thing I could do. I
haven't even begun to deal with additional metadata about the people or
alternatives to foaf:knows that add properties for the links.
Finally, now I've got all those seeAlsos stored, I'll go off and fetch
them. There's a significant number of 404s, html web pages, RSS files,
invalid XML, V large files, and so on. RDF-API being memory based and
largely interpreted gets exceedingly slow on large files. So I've got a
significant amount of error checking to do and some files I just have to
discard to avoid everything grinding to a halt.
My previous post was deliberately provocative but wasn't meant to cause
offence. I hope nobody took it personally. But I still have the
frustration. I'm using an RDF parser. It's the only more or less
complete parser for my platform. I'm having to work round it's
limitations. I'm working with triples not XML. And the first and
simplest task I try, throws up issues that look like fundamental flaws.
Or at least enough inconsistency to make the task unnecessarily hard.
Now it turns out that there is a large and growing body of FOAF data
that is machine produced (as opposed to a text editor) and is consistent
with the FOAF spec (being careful not to say "valid"). The scutter
described above copes well with most of it. So I'll just discard Libby's
file as empty and move on to second order problems. Like trying to
relate a person and some of their data with the file that they actually
produced. There's probably only 3 ways of doing this so that problem
shouldn't take too much code...
--
Julian Bond Email&MSM: julian.bond at voidstar.com
Webmaster: http://www.ecademy.com/
Personal WebLog: http://www.voidstar.com/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
More information about the foaf-dev
mailing list