Hugh Winkler holding forth on computing and the Web

Wednesday, December 27, 2006

Is Atompub Superfluous?

I hate to suggest this at such a late stage -- Bill and Joe just published the twelfth rev of Atompub -- but do we really need Atompub?

All we need is RFC 4287, plus maybe the service description doc in Atompub.

Atompub tells you how to identify URLs where your application can POST documents of certain types. That's not a whole lot better than some so-called "RESTful" API that tells you how to construct a URL. At design time, I'm learning more about your application than I need to know.

My proposal: Once you've located the URL of a what they call an "edit" link, you should be able to GET a form from that link, describing what you can POST there. Since we already have well defined semantics for the XML elements of Atom entries, an edit URL could return an XForm model:

<entry xmlns="">
<content type=""/>
<submission id="form1"

1. This pattern --- getting a form, filling it out, and submitting it -- differentiates the web, REST style from RPC. In RPC, the programmer learns an API at design time; on the web, clients discover the "API" at run time.

2. We sidestep lots of arguments on the APP mailing list that are driven by different ideas about expected behavior; here, the server tells you its behavior. This server could not honor an atom:id should you submit one. So, it doesn't ask you for one.

3. Extensions understood by the server? Don't worry: if the server doesn't ask you for it, it won't honor it. Example: Atom Threading Extension. A very nice server could offer a catchall element (to be invented) where an APP client might stuff all the custom extensions it really wants preserved.

4. We can evolve the Atom syntax without argument. Old servers will never request newly defined elements so will never have to deal with them. I'm guessing this is where Mark is going with reference to semweb.

Saturday, December 23, 2006

MVC Considered Harmful

Struts 2, the popular Java webapp framework, like other frameworks, advertises a Model-View-Controller architecture. MVC seems to be an item on a checklist that frameworks think they have to have. Now, MVC is a great architecture for building desktop applications, with a user interface posting messages through a controller, to query or update a model. But the skeptic would ask, Does the web really feel like a desktop application?

What is wrong with this picture?

Struts architecture

Notice the central concept of a Struts application is the Action. Building a Struts app is constructing a bunch of Action classes. The Struts controller accepts requests and maps the URL to an Action. Ahem. The web is not built upon Uniform Action Locators, is it? An Action implies a verb, and that is the way developers model applications under Struts and other MVC webapp frameworks. In fact, Struts typically identifies Action URLs by the the extension ".do" -- the verb "to do".

Struts misses the concept of Resource. You can't really model a web application without conceiving it as a collection of resources, identified by URI. I think Django is on to this. In Django, you furnish the framework with a map of regular expressions to callback functions. The Django docs call these callback functions your "views", but of course, if done correctly, they are really your resources. (Interestingly, the Django FAQ addresses the question Is Django an MVC framework and has evolved the answer, from this to this. They can't bring themselves to acknowledge a complete break with some form of MVC).

A well architected webapp will organize around Resources and Representations. There are typically four methods on a Resource. Omitting some details, you have

Representation Resource.get()
Representation (Representation)
Representation Resource.put (Representation)
void Resource.delete()

To handle a request, you

1. Identify the resource given its URI, and instantiate it
2. Call the method, passing the request representation if any.
3. Respond with the returned representation.

Resources are probably not much like your domain model. You'll have to map Representations to Resource instances, and Resource instances to, say, database queries. A good architecture abstracts out dispatching to methods based on types of Resources and Representations, and affords a loose coupling between your domain model and the Resource layer. These are sound principles that also will help your webapp play well with REST web architecture. But all dispatch and loose coupling is not MVC.

Wednesday, December 20, 2006

Google Boomerang

Blogger, the Google-owned engine behind this blog, just went final with a major new rev. But it continues to generate massively invalid XHTML, and there's nothing authors can do -- it's bugs in the new templating system. This page you're reading, for example, gets 864 validation errors as of today.

Arne Bleijs made this pithy observation:

I suddenly remember some quotes from the annual conference of the
American Association for Artificial Intelligence (AAAI), this summer in Boston. After Sir Tim Berners-Lee held a keynote speech about the Semantic Web. Peter Norvig, Director of Research at Google challenged him afterwards:

"What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first," Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."

"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. "

Somehow this sounds like a Google Boomerang hitting Blogger Beta. How can it be possible that Google complains about incompetent webmasters and is not able to produce valid XHTML?

Reverse Spam

This isn't a new phenomenon but it's happening ten times more frequently to me this week: I am receiving bounced mail notifications from mail I never sent. I funnel all mail sent to my domain, no matter the recipient, to my main account.

Spammers send mail all over the planet using random domain names and generated account names. As spam filters get better, they are bouncing these messages. But they are bouncing them back to me. So I'm getting other people's spam!

Monday, December 18, 2006

Roy Fielding authored RFC 2616, the HTTP spec. I've seen his personal URL for years: What is this

My wife recently left her job of 11 years assembling books for a book production company. She's acquired quite a bit of esoteric knowledge about printing. Tonight she pronounced, out of nowhere: "Roy Gee Biv". She does this sometimes. Her neurons have been reordered so that occasionally she blurts out mnemonic devices she used in that past life.

This mnemonic device helps you remember the colors of the rainbow: ROYGBIV


Wednesday, December 13, 2006

Message Level Security in HTTP: How hard can it be?

A recent flurry of interest in message level security for Plain Old HTTP: Gunnar makes valid criticisms (though he confounds REST with HTTP); and Robert takes the point. And the issue pops up again on www-tag.

The criticism is this: Using TLS, each intermediary decrypts the message, and re-encrypts it to send along to the next leg of its journey. The message is vulnerable to prying eyes at those transition points.

How hard can it be to graft MLS onto HTTP without the baggage of WS-Security and XML Encrypt, and XML DSig? Somebody must have done this long ago, right? Couldn't find it with a quick google, but it has to look like this:

Define a content transfer encoding for PK, and pass the cert bits, or a URL:

Content-transfer-encoding: RSA; cert="...base64 bits here..."
Content-transfer-encoding: RSA; certref=""

You'd also use these values in "Accept-Encoding" headers.

And we're done, as far as HTTP is concerned.

Applications are in charge of trust. Maybe HTML5 could enhance the
<form> element so you could have:

<form method="POST" action="action" certref="">

and user agents would then encode the entity using PK encryption, and deliver the cert bits in the header as above. As with TLS, the UAs could generate temporary one-off local certs for this purpose.

To GET a secure page, you'd have to have sent the proper Accept-Encoding with your cert as a parameter. If you failed to send this header you'd just get 406, and you'd have to restry with your cert.

This has all been done before somewhere, right?

Thursday, December 07, 2006

Lisp Is Too An Acceptable Lisp (with caveats)

To get my hands dirty with Lisp, I've been putting together a RESTful HTTP framework. It runs with Apache 2.0 and mod_lisp. Over the past year, I've been researching Lisp: absorbing Practical Common Lisp, reading Paul Graham and Bill Clementson and lemonodor and others. There's a consensus out there that Lisp rules, but there's a permathread lamenting the balkanized libraries many think prevent widespread uptake of Lisp. As Steve Yegge puts it in Lisp is not an Acceptable Lisp:
Every single non-standard extension, everything not in the spec, is "wrong" with Common Lisp. This includes any support for threads, filesystem access, processes and IPC, operating system interoperability, a GUI, Unicode, and the long list of other features missing from the latest hyperspec.
Point taken, to some degree. But that problem doesn't directly affect you if you're an architect building an application or web service -- you just pick Allegro or LispWorks or CMUCL or SBCL or CLISP, and you're done. Each Lisp has its own APIs for those services. It's true your code won't port trivially from one platform to another. That's usually not much of a concern, if the platform you choose runs on all the hardware and OSs you need.

But the balkanization indirectly affects you. It impedes the development and distribution of free libraries. If you're like me, you've become addicted to free libraries. A jabber component, or a message queue, or task scheduler, is just a Google away, if you're developing in Java.

To build a cross platform library, a Lisp developer has a fair amount of work to do. At a minimum, if you use any extensions, like sockets or Gray streams, you have to find or write an abstraction that may do little more than hide the package name of your implementation. You may not be able to abstract away some issues -- does each sockets implementation know how to send out of band TCP (MSG_OOB)? Which methods do the underlying Gray streams really require you to override?

Common Lisp has been frozen since 1994, and as best I can tell, there is no process by which Gray streams, sockets, or IPC, or anything could be added to the language now. Contrast to Java Community Process, which for whatever warts[1], has enabled the language to evolve uniformly across implementations. You never have to worry that your Java 5 code that you developed using Sun's JDK won't run on Jrocket or IBM. It just works.

Lisp needs some JSR-like, ongoing process -- not another eight year, big bang ANSI revision -- to standardize changes we take for granted in all other environments. And a benevolent dictator would help.

[1] How ironic that Peter Seibel contrasts the long JSR process against the capability Lisp gives you to extend the language with macros.