RFC: XML to RDF

From structure to semantics

Last Update

10/16/2005; 1:34:42 AM

Try XHTML 2.0 Src XHTML 2.0 RDF Metadata

Purpose

XML to RDF is a transformation format that indicates how RDF can be extracted from a certain XML format. Once a transformation for a certian XML format is defined, RDF can be automatically extracted from every XML file in that format.

Understanding the format

It is clear that few people find RDF easy to understand. Creating a tranformation to RDF adds another level of complexity. So XML to RDF will probably be usable by a small number of people. This is not a problem, because only one transformation has to be defined for each XML format.

Required knowledge for XML to RDF is the RDF model and XPath. Because almost everyone doesn't like the standard RDF/XML syntax, XML to RDF uses it's own simplified RDF/XML syntax. The tutorial will first show how to create RDF in this format, and then show how to adapt the static RDF to an XML to RDF transformation by adding XPath expressions.

Tutorial

Creating RDF data

Resources have properties. The following example shows that the resource at URI http://www.w3.org/ has a dc:title property with value "World Wide Web Consortium".

<xr:transform
  xmlns:xr="https://w3future.com/ns/xr"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:h="http://www.w3.org/1999/xhtml">

  <xr:introducing>
    <rdfs:Resource xr:uri="http://www.w3.org/">
      <dc:title>World Wide Web Consortium</dc:title>
    </rdfs:Resource>
  </xr:introducing>

</xr:transform>

A property value can also be another resource. http://www.w3.org/ also has a dc:rights property. The value of this property is another resource. This resource also has it's own dc:title property. (I'll leave out the root element from now on, it's the same every time.)

<xr:introducing>
  <rdfs:Resource xr:uri="http://www.w3.org/">
    <dc:title>World Wide Web Consortium</dc:title>
    <dc:rights>
      <rdfs:Resource xr:uri="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
        <dc:title>IPR Notice and Disclaimers</dc:title>
      </rdfs:Resource>
    </dc:rights>
  </rdfs:Resource>
</xr:introducing>

Transforming

Now we'll do the above for all xhtml documents. Let's first create a transformation that extracts the title. The static information from the first example above will be replaced by xpaths.

The xr:uri attribute is replaced by a xr:uriSelect attribute. The xpath expression for the URI is $documentURI, which is the URI of the document being transformed.
The text content of the dc:title property is replaced by a xr:select attribute. The xpath expression selects the title element of the xhtml file.

<xr:introducing>
  <rdfs:Resource xr:uriSelect="$documentURI">
    <dc:title xr:select="/h:html/h:head/h:title" />
  </rdfs:Resource>
</xr:introducing>

The dc:rights property can be extracted from xhtml documents by looking for <a> elements with a rel='Copyright' attribute. So the xpath for the dc:rights property is //h:a[@rel='Copyright']. The URI for the copyright webpage is in the href attribute. The title from that webpage can then be looked up by selecting the title element and using the document() function from XSLT.

<xr:introducing>
  <rdfs:Resource xr:uriSelect="$documentURI">
    <dc:title xr:select="/h:html/h:head/h:title" />
    <dc:rights xr:select="//h:a[@rel='Copyright']">
      <rdfs:Resource xr:uriSelect="@href">
        <dc:title xr:select="document(@href)/h:html/h:head/h:title" />
      </rdfs:Resource>
    </dc:rights>
  </rdfs:Resource>
</xr:introducing>

Instead of transforming the copyright webpage in the dc:rights element, it can also be separated. This is usually more readable, and it is preferred when several properties point to the same resource.

<xr:introducing>
  <rdfs:Resource xr:uriSelect="$documentURI">
    <dc:title xr:select="/h:html/h:head/h:title" />
    <dc:rights xr:select="//h:a[@rel='Copyright']" />
  </rdfs:Resource>
</xr:introducing>

<xr:introducing xr:select="//h:a[@rel='Copyright']">
  <rdfs:Resource xr:uriSelect="@href">
    <dc:title xr:select="document(@href)/h:html/h:head/h:title" />
  </rdfs:Resource>
</xr:introducing>