Friday, February 13, 2004 - Sjoerd Visscher's weblog

Sjoerd Visscher's weblog

Pondering those web technologies that may change the future of the world wide web.

Last Update

10/16/2005; 1:29:40 AM

Try XHTML 2.0 Src XHTML 2.0 RDF Metadata

< Friday, February 13, 2004 >

Liberal XML parsing related to personality?

The heat of the discussion on liberal XML parsing has subsided, so this is actually a little late. That's because I wasn't sure if I should post this. But a post by Dave Winer today convinced me to post it anyway. Let me just say up front that I could be completely wrong.

First let me quote the definition of a fatal error in the XML recommendation:

An error which a conforming XML processor MUST detect and report to the application. After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way).

RSS and Atom are both XML file formats, they do not accidentally look like XML. Thus according to the XML specification, aggregators are not allowed to try to show broken feeds. If you are doing liberal XML parsing, you are not playing by the rules.

A lot of people are parsing feeds, or are planning to do so. Most of them do so because they want to do something interesting with the data, it might be an aggregator, but it could also be some cool new application. What they certainly are not interested in is the technology of parsing itself. They simply want to use one of the abundantly available XML parsers. Now there are two ways to do feed parsing. One is to only allow proper XML and patiently educate feed producers who do not use the proper XML tools how to improve. (And almost all feed producers are willing to produce valid XML, but they are not helped enough to actually do that.) The other way is to liberally parse anything that vaguely resembles XML and spoil the fun of using feeds for everybody else. If you are doing liberal XML parsing, you are being inconsiderate.

Now there are only two blogs that I have stopped to be subscribed to because the level of ranting was simply too annoying. The owners of those blogs also happen to be two very vocal advocates of liberal XML parsing.

But this could just be a coincidence.