[rdfweb-dev] RDF parser vs. XML parser

Dan Brickley danbri at w3.org
Tue Aug 26 19:56:05 UTC 2003

Hi Diz,

I missed your msg last week, only just stumbled across it.
Not much to add to Victor's reply
Brief comments intersperced...

* Dave Smith <dizzyd at jabber.org> [2003-08-20 11:19-0600]
> I'm new to the RDF community and admittedly still wrapping my head 
> around everything. One thing that I'm not following (esp. on Edd 
> Dumbill's blog) is this distinction between "RDF parsers" and "XML 
> parsers". Obviously, an XML parser really only deals with syntax, not 
> semantics. But I've read in a couple of places (again, on Edd's blog -- 
> particularly http://usefulinc.com/edd/blog/2003/8/8#13:13) that XML 
> "parsers" break when encountering unexpected (or missing) elements. I'm 
> just not following the logic here, and so suspect that I don't 
> understand what is expected of an "RDF parser" (or processor).

Consuming otherwise-unconstrained RDF/XML with SAX/DOM/XSLT-based
software is a pain, since the same RDF graph can be represented in 
various different ways as an XML tree. If you try to anticipate all the 
variations, you find yourself in effect writing an RDF parser. Generally 
it is more sensible to download an existing RDF parser / library. If you 
instead write code which makes assumptions about how exactly the data is 
written in XML, you can end up writing brittle code which might not deal
well with other serializations (ie. XML encodings) of the same graph.

> Of note, we use XML parsers in Jabber and have been able to deal quite 
> nicely with missing/unexpected elements.

Yup, as Victor noted, RDF parsers are generally written on top of XML
parsers, and there's nothing intrinsically wrong with XML parsers. RDF 
just provides a set of conventions for using XML, in particular for
mixing data and namespaces (vocabularies). Because XML-based schema and DTD 
mechanisms are file-format based, they tend to work at a chunkier
granularity than RDF. Non-RDF XML schemas are all about rules for which tags go 
inside which other tags; RDF/XML has a more free-flowing approach, allowing
any tags so long as they're arranged as an encoding of an RDF graph. So
that's one source of the 'brittle' metaphor; we trade flexibility vs
predictability. If you define a data format in XML-based terms, you get
to say a lot about how the actual XML files will look. If you instead
(like FOAF) define a set of terms (ie. classes/categories and 
properties/attributes/relationships) then you're saying less about the
contents of any specific document, but more about what that document means.

The thread I've just started on 'syntactic profiling' is about one 
approach to having best of both worlds,
(...the IM/buddylist example might be of interest there re Jabber btw)

> I'm involved w/ the Jabber community and we are in the process of 
> looking at FOAF/RDF for describing the various entities on our network. 

Cool. I noticed from your weblog :)

I'd like to help...

stpeter has already made some suggestions re vocab (gender etc) which 
are sensible. I'll follow up on those separately.

I recently found http://psi.affinix.com/ (site currently not working)
which looks like a very promising plaform for experimentation, not least 
because it has PGP support...

> It's quite promising really, but I need to understand how/if a RDF 
> parser is different from an XML parser. It seems like a RDF parser 
> would simply be something built on top of an XML parser (at least, 
> assuming we're using the RDF/XML encoding).

Yes, it's a layer on top. For an example, see Sean Palmer's rdfxml.py
at http://infomesh.net/2003/rdfparser/
"rdfxml.py is a standalone Python module in under 10KB that parses
RDF/XML using SAX.", or try feeding RDF files to the (Jena/ARP-based)
online parser and visualisation tool at http://www.w3.org/RDF/Validator/

Hope this helps,


More information about the foaf-dev mailing list