[rdfweb-dev] Time's running out?

Victor Lindesay victor at vicsoft.co.uk
Sat Aug 9 07:44:09 UTC 2003


Danny, just one point about the process you describe for looking for
Person - knows - ? triples.

As Dan explains in the excellent blog
http://rdfweb.org/mt/foaflog/archives/000047.html
the type of a subject can be implied from a property. 

As the domain of foaf:knows is declared as foaf:Person, all subjects in
statements with a foaf:knows (or any FOAF property with a domain of
foaf:Person) predicate can be assumed to be of type foaf:Person. For
example looking for foaf:Person first will not work with Libby's FOAF as
Libby's type is not explicitly declared but implied as she is a subject
of a ? - foaf:name - ? statement.

This of course adds volumes to the amount of processing and code
required. Also it makes 'validation' harder because if we imply a
subject type then we should check that all other properties of this
subject are consistent with our implied subject type. 


-----Original Message-----
From: rdfweb-dev-bounces at vapours.rdfweb.org
[mailto:rdfweb-dev-bounces at vapours.rdfweb.org] On Behalf Of Danny Ayers
Sent: 08 August 2003 22:04
To: Julian Bond; rdfweb-dev at vapours.rdfweb.org
Subject: RE: [rdfweb-dev] Time's running out?


Hi Julian,

Thanks for this. I only hope someone else has used RAP...(Ian he-elp!!)

> OK. I'm using PHP with the RDF-API (RAP). I'm using Ian's very useful
> guide http://www.semanticplanet.com/2003/05/parsingFOAFWithPHP.html as
a
> basis. I'm looking specifically for seeAlso, Person and knows with
> sub-tasks of grabbing or creating an mbox_sha1sum where it's available
> for any Person found. Your basic scutter with the intent of building a
> database of people and the links between them.
>
> The basic approach is
> 1 define a resource for rdfs:seeAlso. Search for all instances
> 2 define a resource for foaf:Person. Search for all instances
>    3 For each Person search for foaf:mbox or foaf:mbox_sha1sum
>    4 For each Person search for foaf:knows
>      5 For each knows.Person search for foaf:mbox and
foaf:mbox_sha1sum
>
> 1. Works fine. 2,3,4,5 all find nothing. Because there's not a single
> foaf:Person in the file. And yet the data *is* actually in the file.
In
> order to find it I've got to follow the indirection from
> http://www.w3.org/1999/02/22-rdf-syntax-ns# to FOAF. Something that I
> don't think is possible with the available machine-readable namespace
> definitions.

I'm not sure about "define a resource for foaf:Person" - with the Jena
(Java) kit I'd probably load the data in, then look for all resources
where
their rdf:type is foaf:Person, then get all triples with foaf:knows, and
pull out their objects (which should also be foaf:Persons). At that
point
it'd be necessary to retrieve the seeAlso'd files and load them in.
(Doesn't
look like Libby knows anyone though...aw...).

I don't understand what you're saying about the indirection, all it
should
do is stop any names getting tangled.

> So here's a problem that RAP doesn't actually do anything with the
> namespaces except to de-reference the subscripts. Which means that I
as
> a developer have to have knowledge of the namespaces I'm interested in
> and all possible versions of them likely to be of interest.

The first part of this isn't too much to expect - if you're interested
in
FOAF then you probably want some knowledge about it. The extent of your
knowledge only needs to be as far as what you're interested in though -
you
can ignore triples involving foaf:molarCount unless you're into
dentistry
;-)

 Where there
> are two or more versions with data in the wild, I'll have to iterate
> over them. And even though I'm working with triples there's just
enough
> strangeness about the way people construct them that the problem above
> is not an isolated instance.

If there are two or more versions of a persons FOAF file are out there
you
can either choose to ignore the older one, or stay within the RDF model
and
accept all the (possibly obselete) statements as well. Strangeness
certainly
is a problem, it might be useful to have a page on a Wiki for strategies
for
dealing with this.

> Now. Take a look at this http://www.semanticplanet.com/sources/16/
> "Things Described. Nothing of interest." It seems that Ian is
suffering
> from the same problems. In fact it seems to me that this is endemic in
> the process and not particularly any fault of RAP. I could do regex
> searches on the triples but then I'd get caught when a feed used
> namespaces to differentiate between two overlapping property names.

If you try the RDF Validator (Parse URI, display triples and graph) on
Libby's feed you get a good graph. The ARP parser has been able to
extract
the info.

http://www.w3.org/RDF/Validator/

Better still, try it with Morten's FOAF Explorer (not sure what he used)
:

http://xml.mfd-consult.dk/foaf/explorer/?foaf=http%3A%2F%2Fswordfish.rdf
web.
org%2Fpeople%2Flibby%2Frdfweb%2Fwebwho.xrdf

I can't be sure not being familiar with RAP, but I suspect your app and
Ian's aren't asking questions for which there are answers.

> Now the exercise above is just about the simplest thing I could do. I
> haven't even begun to deal with additional metadata about the people
or
> alternatives to foaf:knows that add properties for the links.
>
> Finally, now I've got all those seeAlsos stored, I'll go off and fetch
> them. There's a significant number of 404s, html web pages, RSS files,
> invalid XML, V large files, and so on. RDF-API being memory based and
> largely interpreted gets exceedingly slow on large files. So I've got
a
> significant amount of error checking to do and some files I just have
to
> discard to avoid everything grinding to a halt.

Yep, the joys of spidering!

[snip unnecessary apology]

> Now it turns out that there is a large and growing body of FOAF data
> that is machine produced (as opposed to a text editor) and is
consistent
> with the FOAF spec (being careful not to say "valid"). The scutter
> described above copes well with most of it. So I'll just discard
Libby's
> file as empty and move on to second order problems.

Sounds reasonable.

Like trying to
> relate a person and some of their data with the file that they
actually
> produced. There's probably only 3 ways of doing this so that problem
> shouldn't take too much code...

Heh, that's more like it ;-)

Cheers,
Danny.


_______________________________________________
rdfweb-dev mailing list
rdfweb-dev at vapours.rdfweb.org
wiki: http://rdfweb.org/topic/FoafProject
http://rdfweb.org/mailman/listinfo/rdfweb-dev





More information about the foaf-dev mailing list