[rdfweb-dev] Time's running out?
victor at vicsoft.co.uk
Sat Aug 9 07:44:09 UTC 2003
Danny, just one point about the process you describe for looking for
Person - knows - ? triples.
As Dan explains in the excellent blog
the type of a subject can be implied from a property.
As the domain of foaf:knows is declared as foaf:Person, all subjects in
statements with a foaf:knows (or any FOAF property with a domain of
foaf:Person) predicate can be assumed to be of type foaf:Person. For
example looking for foaf:Person first will not work with Libby's FOAF as
Libby's type is not explicitly declared but implied as she is a subject
of a ? - foaf:name - ? statement.
This of course adds volumes to the amount of processing and code
required. Also it makes 'validation' harder because if we imply a
subject type then we should check that all other properties of this
subject are consistent with our implied subject type.
From: rdfweb-dev-bounces at vapours.rdfweb.org
[mailto:rdfweb-dev-bounces at vapours.rdfweb.org] On Behalf Of Danny Ayers
Sent: 08 August 2003 22:04
To: Julian Bond; rdfweb-dev at vapours.rdfweb.org
Subject: RE: [rdfweb-dev] Time's running out?
Thanks for this. I only hope someone else has used RAP...(Ian he-elp!!)
> OK. I'm using PHP with the RDF-API (RAP). I'm using Ian's very useful
> guide http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html as
> basis. I'm looking specifically for seeAlso, Person and knows with
> sub-tasks of grabbing or creating an mbox_sha1sum where it's available
> for any Person found. Your basic scutter with the intent of building a
> database of people and the links between them.
> The basic approach is
> 1 define a resource for rdfs:seeAlso. Search for all instances
> 2 define a resource for foaf:Person. Search for all instances
> 3 For each Person search for foaf:mbox or foaf:mbox_sha1sum
> 4 For each Person search for foaf:knows
> 5 For each knows.Person search for foaf:mbox and
> 1. Works fine. 2,3,4,5 all find nothing. Because there's not a single
> foaf:Person in the file. And yet the data *is* actually in the file.
> order to find it I've got to follow the indirection from
> http://www.w3.org/1999/02/22-rdf-syntax-ns# to FOAF. Something that I
> don't think is possible with the available machine-readable namespace
I'm not sure about "define a resource for foaf:Person" - with the Jena
(Java) kit I'd probably load the data in, then look for all resources
their rdf:type is foaf:Person, then get all triples with foaf:knows, and
pull out their objects (which should also be foaf:Persons). At that
it'd be necessary to retrieve the seeAlso'd files and load them in.
look like Libby knows anyone though...aw...).
I don't understand what you're saying about the indirection, all it
do is stop any names getting tangled.
> So here's a problem that RAP doesn't actually do anything with the
> namespaces except to de-reference the subscripts. Which means that I
> a developer have to have knowledge of the namespaces I'm interested in
> and all possible versions of them likely to be of interest.
The first part of this isn't too much to expect - if you're interested
FOAF then you probably want some knowledge about it. The extent of your
knowledge only needs to be as far as what you're interested in though -
can ignore triples involving foaf:molarCount unless you're into
> are two or more versions with data in the wild, I'll have to iterate
> over them. And even though I'm working with triples there's just
> strangeness about the way people construct them that the problem above
> is not an isolated instance.
If there are two or more versions of a persons FOAF file are out there
can either choose to ignore the older one, or stay within the RDF model
accept all the (possibly obselete) statements as well. Strangeness
is a problem, it might be useful to have a page on a Wiki for strategies
dealing with this.
> Now. Take a look at this http://www.semanticplanet.com/sources/16/
> "Things Described. Nothing of interest." It seems that Ian is
> from the same problems. In fact it seems to me that this is endemic in
> the process and not particularly any fault of RAP. I could do regex
> searches on the triples but then I'd get caught when a feed used
> namespaces to differentiate between two overlapping property names.
If you try the RDF Validator (Parse URI, display triples and graph) on
Libby's feed you get a good graph. The ARP parser has been able to
Better still, try it with Morten's FOAF Explorer (not sure what he used)
I can't be sure not being familiar with RAP, but I suspect your app and
Ian's aren't asking questions for which there are answers.
> Now the exercise above is just about the simplest thing I could do. I
> haven't even begun to deal with additional metadata about the people
> alternatives to foaf:knows that add properties for the links.
> Finally, now I've got all those seeAlsos stored, I'll go off and fetch
> them. There's a significant number of 404s, html web pages, RSS files,
> invalid XML, V large files, and so on. RDF-API being memory based and
> largely interpreted gets exceedingly slow on large files. So I've got
> significant amount of error checking to do and some files I just have
> discard to avoid everything grinding to a halt.
Yep, the joys of spidering!
[snip unnecessary apology]
> Now it turns out that there is a large and growing body of FOAF data
> that is machine produced (as opposed to a text editor) and is
> with the FOAF spec (being careful not to say "valid"). The scutter
> described above copes well with most of it. So I'll just discard
> file as empty and move on to second order problems.
Like trying to
> relate a person and some of their data with the file that they
> produced. There's probably only 3 ways of doing this so that problem
> shouldn't take too much code...
Heh, that's more like it ;-)
rdfweb-dev mailing list
rdfweb-dev at vapours.rdfweb.org
More information about the foaf-dev