[rdfweb-dev] Re: Syntactic profiling (FOAF document formats)
Julian Bond
julian_bond at voidstar.com
Thu Aug 28 10:47:17 UTC 2003
Edd Dumbill <edd at usefulinc.com> wrote:
>Using an RDF parser to process FOAF is *easier* then regex land.
>Especially if you go further than just using a parser and use a toolkit
>like Redland, Drive or Jena because then your data model is done for you
>as well.
As discussed earlier none of these three are suitable for me. The best I
can find for my platform is RAP and that's barfing on large files.
>I say let's write some more
>complex FOAF consuming applications and then see what we did or did not
>require of the syntax.
That's what led me to this.
>I'd like to see us take a break from this argument and talk about issues
>related to processing the data once it's parsed. There are some
>meaningful and deep issues we need to figure out.
Let's talk about those then[1].
A common approach currently used in scutters is to grab all the RDF you
can and dump it into a triple store. Usually the triple store also has
some secondary data such as where found and time found. I don't think
this is scalable unless you have lots of processing power and disk space
available and in the extreme case might require Google-sized resource.
Even with the current universe of RDF containing FOAF this is leading to
triple stores with a million or so triples.
One solution to this is to do more processing at collection time and
store data-modelled information with links back to the source.
That leads us to the problem of the source of the data and what right
the author of the data has to make the statements they're making. RDF
constructs like foaf:maker, foaf:made foaf:Document with added DC all
require additional indirection. It's only code. But it's code that it's
not immediately obvious you need to write. It's all too easy to write
some simpler queries into your triple store that miss these.
I don't know, maybe I'm not explaining myself well enough. *I am not*
arguing for a non-RDF syntactic profile that is no longer parsable by
RDF tools. Perhaps all I'm looking for is a best practice document or a
busy developers guide or something. Some way of saying to new people
aiming to auto-generate FOAF from code "look at this and do it like
that". Maybe it's as trivial as saying "copy the output of foaf-a-matic
while putting in your own data"[2][3].
Returning to processing strategies. Let's say I have a collection of SQL
tables for Person, mbox, feed with person_mbox, person_feed and
person_person link tables. I shouldn't have to store very much info
against Person because I can always go back to the source feed. Except
that without some additional metadata this approach has numerous
problems with data ageing, validity of source, and so on.
I find myself going round and round this and bouncing back and forth
between simply storing every triple I ever find and dealing with the
processing problems late, or processing early and then having bad data
that I can't rectify because I've thrown away too much too early. The
whole syntactic profile thing is an attempt to cut this gordian knot by
trying to force a bit more structure onto the source data. The push back
on this is substantial(!) so consensus would appear to be that this is
impossible/wrong/misguided/displays a woeful misunderstanding.
[1]This is a threaded mailing list. Feel free to start new threads ;-)
[2]Danbri. Your work on the spec is superb. Something that would make it
better is a few more examples.
[3]If the idea of syntactic profiles takes root, there's a next step
which is to put into the file a reference as to which profile this
particular feed is supposed to follow.
--
Julian Bond Email&MSM: julian.bond at voidstar.com
Webmaster: http://www.ecademy.com/
Personal WebLog: http://www.voidstar.com/
M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
More information about the foaf-dev
mailing list