[rdfweb-dev] Time's running out?

Danny Ayers danny666 at virgilio.it
Fri Aug 8 21:03:37 UTC 2003


Hi Julian,

Thanks for this. I only hope someone else has used RAP...(Ian he-elp!!)

> OK. I'm using PHP with the RDF-API (RAP). I'm using Ian's very useful
> guide http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html as a
> basis. I'm looking specifically for seeAlso, Person and knows with
> sub-tasks of grabbing or creating an mbox_sha1sum where it's available
> for any Person found. Your basic scutter with the intent of building a
> database of people and the links between them.
>
> The basic approach is
> 1 define a resource for rdfs:seeAlso. Search for all instances
> 2 define a resource for foaf:Person. Search for all instances
>    3 For each Person search for foaf:mbox or foaf:mbox_sha1sum
>    4 For each Person search for foaf:knows
>      5 For each knows.Person search for foaf:mbox and foaf:mbox_sha1sum
>
> 1. Works fine. 2,3,4,5 all find nothing. Because there's not a single
> foaf:Person in the file. And yet the data *is* actually in the file. In
> order to find it I've got to follow the indirection from
> http://www.w3.org/1999/02/22-rdf-syntax-ns# to FOAF. Something that I
> don't think is possible with the available machine-readable namespace
> definitions.

I'm not sure about "define a resource for foaf:Person" - with the Jena
(Java) kit I'd probably load the data in, then look for all resources where
their rdf:type is foaf:Person, then get all triples with foaf:knows, and
pull out their objects (which should also be foaf:Persons). At that point
it'd be necessary to retrieve the seeAlso'd files and load them in. (Doesn't
look like Libby knows anyone though...aw...).

I don't understand what you're saying about the indirection, all it should
do is stop any names getting tangled.

> So here's a problem that RAP doesn't actually do anything with the
> namespaces except to de-reference the subscripts. Which means that I as
> a developer have to have knowledge of the namespaces I'm interested in
> and all possible versions of them likely to be of interest.

The first part of this isn't too much to expect - if you're interested in
FOAF then you probably want some knowledge about it. The extent of your
knowledge only needs to be as far as what you're interested in though - you
can ignore triples involving foaf:molarCount unless you're into dentistry
;-)

 Where there
> are two or more versions with data in the wild, I'll have to iterate
> over them. And even though I'm working with triples there's just enough
> strangeness about the way people construct them that the problem above
> is not an isolated instance.

If there are two or more versions of a persons FOAF file are out there you
can either choose to ignore the older one, or stay within the RDF model and
accept all the (possibly obselete) statements as well. Strangeness certainly
is a problem, it might be useful to have a page on a Wiki for strategies for
dealing with this.

> Now. Take a look at this http://www.semanticplanet.com/sources/16/
> "Things Described. Nothing of interest." It seems that Ian is suffering
> from the same problems. In fact it seems to me that this is endemic in
> the process and not particularly any fault of RAP. I could do regex
> searches on the triples but then I'd get caught when a feed used
> namespaces to differentiate between two overlapping property names.

If you try the RDF Validator (Parse URI, display triples and graph) on
Libby's feed you get a good graph. The ARP parser has been able to extract
the info.

http://www.w3.org/RDF/Validator/

Better still, try it with Morten's FOAF Explorer (not sure what he used) :

http://xml.mfd-consult.dk/foaf/explorer/?foaf=http%3A%2F%2Fswordfish.rdfweb.
org%2Fpeople%2Flibby%2Frdfweb%2Fwebwho.xrdf

I can't be sure not being familiar with RAP, but I suspect your app and
Ian's aren't asking questions for which there are answers.

> Now the exercise above is just about the simplest thing I could do. I
> haven't even begun to deal with additional metadata about the people or
> alternatives to foaf:knows that add properties for the links.
>
> Finally, now I've got all those seeAlsos stored, I'll go off and fetch
> them. There's a significant number of 404s, html web pages, RSS files,
> invalid XML, V large files, and so on. RDF-API being memory based and
> largely interpreted gets exceedingly slow on large files. So I've got a
> significant amount of error checking to do and some files I just have to
> discard to avoid everything grinding to a halt.

Yep, the joys of spidering!

[snip unnecessary apology]

> Now it turns out that there is a large and growing body of FOAF data
> that is machine produced (as opposed to a text editor) and is consistent
> with the FOAF spec (being careful not to say "valid"). The scutter
> described above copes well with most of it. So I'll just discard Libby's
> file as empty and move on to second order problems.

Sounds reasonable.

Like trying to
> relate a person and some of their data with the file that they actually
> produced. There's probably only 3 ways of doing this so that problem
> shouldn't take too much code...

Heh, that's more like it ;-)

Cheers,
Danny.




More information about the foaf-dev mailing list