RFC: XML to RDF
From structure to semantics
Last Update
10/16/2005; 1:34:42 AM
Purpose
XML to RDF is a transformation format that indicates how RDF can be extracted from a certain XML format. Once a transformation for a certian XML format is defined, RDF can be automatically extracted from every XML file in that format.
Understanding the format
It is clear that few people find RDF easy to understand. Creating a tranformation to RDF adds another level of complexity. So XML to RDF will probably be usable by a small number of people. This is not a problem, because only one transformation has to be defined for each XML format.
Required knowledge for XML to RDF is the RDF model and XPath. Because almost everyone doesn't like the standard RDF/XML syntax, XML to RDF uses it's own simplified RDF/XML syntax. The tutorial will first show how to create RDF in this format, and then show how to adapt the static RDF to an XML to RDF transformation by adding XPath expressions.
Tutorial
Creating RDF data
Resources have properties. The following example shows that the
resource at URI http://www.w3.org/ has a dc:title
property with value "World Wide Web Consortium".
<xr:transform
xmlns:xr="http://w3future.com/ns/xr"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:h="http://www.w3.org/1999/xhtml">
<xr:introducing>
<rdfs:Resource xr:uri="http://www.w3.org/">
<dc:title>World Wide Web Consortium</dc:title>
</rdfs:Resource>
</xr:introducing>
</xr:transform>
A property value can also be another resource.
http://www.w3.org/ also has a dc:rights
property. The value of this property is another resource.
This resource also has it's own dc:title property.
(I'll leave out the root element from now on, it's the same every time.)
<xr:introducing>
<rdfs:Resource xr:uri="http://www.w3.org/">
<dc:title>World Wide Web Consortium</dc:title>
<dc:rights>
<rdfs:Resource xr:uri="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
<dc:title>IPR Notice and Disclaimers</dc:title>
</rdfs:Resource>
</dc:rights>
</rdfs:Resource>
</xr:introducing>
Transforming
Now we'll do the above for all xhtml documents. Let's first create a transformation that extracts the title. The static information from the first example above will be replaced by xpaths.
- The
xr:uriattribute is replaced by axr:uriSelectattribute. The xpath expression for the URI is$documentURI, which is the URI of the document being transformed. - The text content of the
dc:titleproperty is replaced by axr:selectattribute. The xpath expression selects the title element of the xhtml file.
<xr:introducing>
<rdfs:Resource xr:uriSelect="$documentURI">
<dc:title xr:select="/h:html/h:head/h:title" />
</rdfs:Resource>
</xr:introducing>
The dc:rights property can be extracted from xhtml documents
by looking for <a> elements with a rel='Copyright' attribute.
So the xpath for the dc:rights property is //h:a[@rel='Copyright'].
The URI for the copyright webpage is in the href attribute.
The title from that webpage can then be looked up by selecting the title
element and using the document() function from XSLT.
<xr:introducing>
<rdfs:Resource xr:uriSelect="$documentURI">
<dc:title xr:select="/h:html/h:head/h:title" />
<dc:rights xr:select="//h:a[@rel='Copyright']">
<rdfs:Resource xr:uriSelect="@href">
<dc:title xr:select="document(@href)/h:html/h:head/h:title" />
</rdfs:Resource>
</dc:rights>
</rdfs:Resource>
</xr:introducing>
Instead of transforming the copyright webpage in the dc:rights element,
it can also be separated. This is usually more readable, and it is preferred when several
properties point to the same resource.
<xr:introducing>
<rdfs:Resource xr:uriSelect="$documentURI">
<dc:title xr:select="/h:html/h:head/h:title" />
<dc:rights xr:select="//h:a[@rel='Copyright']" />
</rdfs:Resource>
</xr:introducing>
<xr:introducing xr:select="//h:a[@rel='Copyright']">
<rdfs:Resource xr:uriSelect="@href">
<dc:title xr:select="document(@href)/h:html/h:head/h:title" />
</rdfs:Resource>
</xr:introducing>