Sjoerd Visscher's weblog
Last Update
10/16/2005; 1:29:42 AM
Friday, February 13, 2004
Liberal XML parsing related to personality?
The heat of the discussion on liberal XML parsing has subsided, so this is actually a little late. That's because I wasn't sure if I should post this. But a post by Dave Winer today convinced me to post it anyway. Let me just say up front that I could be completely wrong.
First let me quote the definition of a fatal error in the XML recommendation:
An error which a conforming XML processor MUST detect and report to the application. After encountering a fatal error, the processor MAY continue processing the data to search for further errors and MAY report such errors to the application. In order to support correction of errors, the processor MAY make unprocessed data from the document (with intermingled character data and markup) available to the application. Once a fatal error is detected, however, the processor MUST NOT continue normal processing (i.e., it MUST NOT continue to pass character data and information about the document's logical structure to the application in the normal way).
RSS and Atom are both XML file formats, they do not accidentally look like XML. Thus according to the XML specification, aggregators are not allowed to try to show broken feeds. If you are doing liberal XML parsing, you are not playing by the rules.
A lot of people are parsing feeds, or are planning to do so. Most of them do so because they want to do something interesting with the data, it might be an aggregator, but it could also be some cool new application. What they certainly are not interested in is the technology of parsing itself. They simply want to use one of the abundantly available XML parsers. Now there are two ways to do feed parsing. One is to only allow proper XML and patiently educate feed producers who do not use the proper XML tools how to improve. (And almost all feed producers are willing to produce valid XML, but they are not helped enough to actually do that.) The other way is to liberally parse anything that vaguely resembles XML and spoil the fun of using feeds for everybody else. If you are doing liberal XML parsing, you are being inconsiderate.
Now there are only two blogs that I have stopped to be subscribed to because the level of ranting was simply too annoying. The owners of those blogs also happen to be two very vocal advocates of liberal XML parsing.
But this could just be a coincidence.
Tuesday, February 10, 2004
Providing context in programming (part 2)
(Read part 1 first.)
When reading code, a lot of people like to print out the code and read it on paper. One of the reasons for that is that it is easier to switch from function call to the function specification and back. However functions are supposed to provide abstraction. You should be able to understand the function call without the function specification. Why is this often not possible? Because the function call doesn't provide enough context for the arguments.
Take for example this function from the DOM: parent.insertBefore(nodeA, nodeB). Is nodeA inserted before nodeB, or is nodeB inserted before nodeA? I keep forgetting that. And the more arguments, the more ridiculous it gets: mouseEvent.initMouseEvent("DOMMouseScroll", true, true, view, numLines, refPoint.x, refPoint.y, point.x, point.y, isControl, isAlt, isShift, isMeta, 0, null) (Simplified from the Mozilla source code). It's impossible to tell what each argument does.
I think almost any programmer can relate to this problem. But functions with multiple arguments seem to be so fundamental in programming that solutions to this problem never actually involve abandoning their use. These solutions are f.e. IntelliSense and using properly named constants. They reintroduce the missing context.
But proper language design can fix this problem more elegantly. Smalltalk provides an interesting solution: each extra argument is separated by a word. F.e. parent insert: nodeA before: nodeB is as clear as it can be. The only languages I know that actually restrict the number of arguments are APL and its derivatives. In APL functions are monadic (one argument, prefix notation) or dyadic (two arguments, infix notation).
That functions with 2 arguments are written with infix notation is very important. Using the DOM example again, but now assuming we don't need the parent: insertBefore nodeA, nodeB is not clear, but nodeA insertBefore nodeB is clear. With infix notation you have more options to choose a clear function name. Note that a method call with one argument on an object can be seen as infix: nodeA.insertBefore(nodeB).
Is a maximum of 2 arguments realistic for everyday programming? I'd say yes, and thinking about this problem actually improved my code! It forces you to really think about the structure of your data. F.e. when you'd normally be lazy and pass a point on the screen as two arguments x and y, you're now forced to create a Point object. It's hard to prove this is a good thing, but I've found it improves my code a lot. And again, programming language design comes into play, because creating simple intermediate objects is too much work, and often performs bad.
More later…
Thursday, February 05, 2004
When a Search Engine Isn't Enough, Call a Librarian?
"What's the name of the party that Ross Perot established?" a user wanted to know.
Ms. Tuckerman checked the Internet for a biography of Mr. Perot. Then she quickly switched to an electronic database of biographies to which the library subscribes. But even after scrolling through several screens of text, she was unable to come up with a satisfactory answer.
So she turned to a rotating bookshelf next to her desk and selected a volume of the World Book Encyclopedia. "Sometimes the old-fashioned sources work the best," she said. Within a few minutes she found the answer in the encyclopedia: the Reform Party. [NYTimes via Scripting News]
Type in Google: ross perot party. Answer within a few seconds. Old-fashioned sources only work the best when you use old-fashioned search queries.