Messages not Models: 2006

Wednesday, December 27, 2006

Is Atompub Superfluous?

I hate to suggest this at such a late stage -- Bill and Joe just published the twelfth rev of Atompub -- but do we really need Atompub?

All we need is RFC 4287, plus maybe the service description doc in Atompub.

Atompub tells you how to identify URLs where your application can POST documents of certain types. That's not a whole lot better than some so-called "RESTful" API that tells you how to construct a URL. At design time, I'm learning more about your application than I need to know.

My proposal: Once you've located the URL of a what they call an "edit" link, you should be able to GET a form from that link, describing what you can POST there. Since we already have well defined semantics for the XML elements of Atom entries, an edit URL could return an XForm model:


<model>
  <entry xmlns="http://www.w3.org/2005/Atom">
     <link/>
     <updated/>
     <summary/>
     <content type=""/>
  </entry>
  <submission id="form1"
                 action="entries"
                 method="post"/>
</model>

1. This pattern --- getting a form, filling it out, and submitting it -- differentiates the web, REST style from RPC. In RPC, the programmer learns an API at design time; on the web, clients discover the "API" at run time.

2. We sidestep lots of arguments on the APP mailing list that are driven by different ideas about expected behavior; here, the server tells you its behavior. This server could not honor an atom:id should you submit one. So, it doesn't ask you for one.

3. Extensions understood by the server? Don't worry: if the server doesn't ask you for it, it won't honor it. Example: Atom Threading Extension. A very nice server could offer a catchall element (to be invented) where an APP client might stuff all the custom extensions it really wants preserved.

4. We can evolve the Atom syntax without argument. Old servers will never request newly defined elements so will never have to deal with them. I'm guessing this is where Mark is going with reference to semweb.

Saturday, December 23, 2006

MVC Considered Harmful

Struts 2, the popular Java webapp framework, like other frameworks, advertises a Model-View-Controller architecture. MVC seems to be an item on a checklist that frameworks think they have to have. Now, MVC is a great architecture for building desktop applications, with a user interface posting messages through a controller, to query or update a model. But the skeptic would ask, Does the web really feel like a desktop application?

What is wrong with this picture?

Struts architecture

Notice the central concept of a Struts application is the Action. Building a Struts app is constructing a bunch of Action classes. The Struts controller accepts requests and maps the URL to an Action. Ahem. The web is not built upon Uniform Action Locators, is it? An Action implies a verb, and that is the way developers model applications under Struts and other MVC webapp frameworks. In fact, Struts typically identifies Action URLs by the the extension ".do" -- the verb "to do".

Struts misses the concept of Resource. You can't really model a web application without conceiving it as a collection of resources, identified by URI. I think Django is on to this. In Django, you furnish the framework with a map of regular expressions to callback functions. The Django docs call these callback functions your "views", but of course, if done correctly, they are really your resources. (Interestingly, the Django FAQ addresses the question Is Django an MVC framework and has evolved the answer, from this to this. They can't bring themselves to acknowledge a complete break with some form of MVC).

A well architected webapp will organize around Resources and Representations. There are typically four methods on a Resource. Omitting some details, you have

Representation Resource.get()
Representation Resource.post (Representation)
Representation Resource.put (Representation)
void Resource.delete()

To handle a request, you

1. Identify the resource given its URI, and instantiate it
2. Call the method, passing the request representation if any.
3. Respond with the returned representation.

Resources are probably not much like your domain model. You'll have to map Representations to Resource instances, and Resource instances to, say, database queries. A good architecture abstracts out dispatching to methods based on types of Resources and Representations, and affords a loose coupling between your domain model and the Resource layer. These are sound principles that also will help your webapp play well with REST web architecture. But all dispatch and loose coupling is not MVC.

Wednesday, December 20, 2006

Google Boomerang

Blogger, the Google-owned engine behind this blog, just went final with a major new rev. But it continues to generate massively invalid XHTML, and there's nothing authors can do -- it's bugs in the new templating system. This page you're reading, for example, gets 864 validation errors as of today.

Arne Bleijs made this pithy observation:

I suddenly remember some quotes from the annual conference of the
American Association for Artificial Intelligence (AAAI), this summer in Boston. After Sir Tim Berners-Lee held a keynote speech about the Semantic Web. Peter Norvig, Director of Research at Google challenged him afterwards:
"What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first," Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user."
"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. "
Somehow this sounds like a Google Boomerang hitting Blogger Beta. How can it be possible that Google complains about incompetent webmasters and is not able to produce valid XHTML?

Reverse Spam

This isn't a new phenomenon but it's happening ten times more frequently to me this week: I am receiving bounced mail notifications from mail I never sent. I funnel all mail sent to my domain, no matter the recipient, to my main account.

Spammers send mail all over the planet using random domain names and generated account names. As spam filters get better, they are bouncing these messages. But they are bouncing them back to me. So I'm getting other people's spam!

Monday, December 18, 2006

gbiv.com

Roy Fielding authored RFC 2616, the HTTP spec. I've seen his personal URL for years: http://roy.gbiv.com/. What is this gbiv.com?

My wife recently left her job of 11 years assembling books for a book production company. She's acquired quite a bit of esoteric knowledge about printing. Tonight she pronounced, out of nowhere: "Roy Gee Biv". She does this sometimes. Her neurons have been reordered so that occasionally she blurts out mnemonic devices she used in that past life.

This mnemonic device helps you remember the colors of the rainbow: ROYGBIV

red
orange
yellow
green
blue
indigo
violet

Wednesday, December 13, 2006

Message Level Security in HTTP: How hard can it be?

A recent flurry of interest in message level security for Plain Old HTTP: Gunnar makes valid criticisms (though he confounds REST with HTTP); and Robert takes the point. And the issue pops up again on www-tag.

The criticism is this: Using TLS, each intermediary decrypts the message, and re-encrypts it to send along to the next leg of its journey. The message is vulnerable to prying eyes at those transition points.

How hard can it be to graft MLS onto HTTP without the baggage of WS-Security and XML Encrypt, and XML DSig? Somebody must have done this long ago, right? Couldn't find it with a quick google, but it has to look like this:

Define a content transfer encoding for PK, and pass the cert bits, or a URL:

Content-transfer-encoding: RSA; cert="...base64 bits here..."
or
Content-transfer-encoding: RSA; certref="http://foo.bar/cert"

You'd also use these values in "Accept-Encoding" headers.

And we're done, as far as HTTP is concerned.

Applications are in charge of trust. Maybe HTML5 could enhance the
<form> element so you could have:

<form method="POST" action="action" certref="http://mybank.com/cert">

and user agents would then encode the entity using PK encryption, and deliver the cert bits in the header as above. As with TLS, the UAs could generate temporary one-off local certs for this purpose.

To GET a secure page, you'd have to have sent the proper Accept-Encoding with your cert as a parameter. If you failed to send this header you'd just get 406, and you'd have to restry with your cert.

This has all been done before somewhere, right?

Thursday, December 07, 2006

Lisp Is Too An Acceptable Lisp (with caveats)

To get my hands dirty with Lisp, I've been putting together a RESTful HTTP framework. It runs with Apache 2.0 and mod_lisp. Over the past year, I've been researching Lisp: absorbing Practical Common Lisp, reading Paul Graham and Bill Clementson and lemonodor and others. There's a consensus out there that Lisp rules, but there's a permathread lamenting the balkanized libraries many think prevent widespread uptake of Lisp. As Steve Yegge puts it in Lisp is not an Acceptable Lisp:

Every single non-standard extension, everything not in the spec, is "wrong" with Common Lisp. This includes any support for threads, filesystem access, processes and IPC, operating system interoperability, a GUI, Unicode, and the long list of other features missing from the latest hyperspec.

Point taken, to some degree. But that problem doesn't directly affect you if you're an architect building an application or web service -- you just pick Allegro or LispWorks or CMUCL or SBCL or CLISP, and you're done. Each Lisp has its own APIs for those services. It's true your code won't port trivially from one platform to another. That's usually not much of a concern, if the platform you choose runs on all the hardware and OSs you need.

But the balkanization indirectly affects you. It impedes the development and distribution of free libraries. If you're like me, you've become addicted to free libraries. A jabber component, or a message queue, or task scheduler, is just a Google away, if you're developing in Java.

To build a cross platform library, a Lisp developer has a fair amount of work to do. At a minimum, if you use any extensions, like sockets or Gray streams, you have to find or write an abstraction that may do little more than hide the package name of your implementation. You may not be able to abstract away some issues -- does each sockets implementation know how to send out of band TCP (MSG_OOB)? Which methods do the underlying Gray streams really require you to override?

Common Lisp has been frozen since 1994, and as best I can tell, there is no process by which Gray streams, sockets, or IPC, or anything could be added to the language now. Contrast to Java Community Process, which for whatever warts[1], has enabled the language to evolve uniformly across implementations. You never have to worry that your Java 5 code that you developed using Sun's JDK won't run on Jrocket or IBM. It just works.

Lisp needs some JSR-like, ongoing process -- not another eight year, big bang ANSI revision -- to standardize changes we take for granted in all other environments. And a benevolent dictator would help.

[1] How ironic that Peter Seibel contrasts the long JSR process against the capability Lisp gives you to extend the language with macros.

Monday, November 06, 2006

Edgy + Beryl = Love

I spent the weekend upgrading my laptop from Ubuntu Dapper to Edgy. It was painful, relative to the usual seamless Ubuntu intra-release updating process. Painful, but surmountable. To reward myself after the pain, I installed the Beryl desktop. I LOVE BERYL! It renews my love affair with my computer! If you're unfamiliar with Beryl/Compiz, check out some of these videos... but they don't do the user experience justice, trust me. It's worth the effort to try out Beryl.

The rest of this posting is really for Googlers trying to solve upgrade issues from Dapper to Edgy, and also a little Beryl guidance. Welcome, visitors!

After I upgraded using apt-get distupgrade, my system failed to boot into the new kernel. If you are booting and it seems to hang, wait three and four minutes, and you may see something like this:

ALERT! /dev/disk/by-uuid/ed4395c8-e15c-4b20-8716-76ceff89614e does not exist. Dropping to a shell!

BusyBos v1.1.3 (DEbian 1:1.1.3-2ubuntu3) Built in shell (ash)
Enter 'help' for a list of built in commands.

Don't panic. Since you're upgrading, you still have your old kernel you can boot into. Boot it, change to your /boot directory, and do this:

dpkg-reconfigure linux-image-2.6.17-10-generic

(use the same name suffix, e.g. "2.6.17-10-generic", as the new kernel you upgraded to). This command builds you a new initrd image file, and you will be able to boot.

My second issue was particular to my laptop, a Dell Precision M70 with an Nvidia card. You really need your Nvidia card to work if you're going to use Beryl. Something in my default upgrade linked to the latest Nvidia driver (1.0-8776), but that driver did not come in the upgrade; X failed to start. Thanks to Alberto Milone for these excellent notes on upgrading the driver without disabling your wireless card.

I had a fair amount of trouble getting Compiz/Beryl to work. I had tried Compiz on Dapper, and never got a working install. After upgrading to Edgy and getting it stable, I investigated Beryl, and had similar trouble: Lots of garbage bits on the screen, windows with no titlebars or borders -- unusable and depressing.

Finally I fixed all my problems by twiddling my xorg.conf. Here are the relevant twiddles (ymmv):

Section "Device"
Identifier "NVIDIA Corporation NV41 [Quadro FX Go1400]"
Driver "nvidia"
BusID "PCI:1:0:0"
Option "RenderAccel" "true"
Option "AllowGLXWithComposite" "true"
Option "Triplebuffer" "true"
Option "AddARGBGLXVisuals" "True"
EndSection
...
Section "Screen"
Identifier "Default Screen"
Device "NVIDIA Corporation NV41 [Quadro FX Go1400]"
Monitor "Generic Monitor"
# depth had been 16:
DefaultDepth 24

...

Sunday, August 27, 2006

Tim Bray on Ruby and Python vs Lisp

Tim Bray: "If Lisp’s audience had been harried sysadmins rather than AI researchers, it’d rule the world by now."

Monday, August 07, 2006

The Ruby Conspiracy

The Ruby Conspiracy: "Who are those who are benefiting from Ruby on Rails? Answer: O'Reilly Publishing, the authors Bruce Tate and Dave Thomas and a handful of consultants....We have two production applications running on Ruby. And how is it. Well, despite being perhaps no more than 5% of the functionality of our applications, Ruby on Rails is the number one consumer of Oracle CPU and logical gets....After all, the productivity benefits of Ruby are so much greater than Java you will save all of the money in development. Or do you. Our experience was that Ruby on Rails took longer than Java would have. And what about maintenance. Well we just refactor as things change. Or do we? There are no Ruby tools that support refactoring. And nor are they are expected due to the difficulties of implementing refactoring tools for Dynamic Languages, or so I am advised."

Interpreted... tons of magic^H^H^H^H^H abstraction. Right.

Thursday, August 03, 2006

Voidstar - Desktop clients

Julian Bond references Martin Geddes on desktop clients -- the browser, the messenger, the "manager":

All the portals are focussed on collecting everything you might read in one place. The "My Page". Nobody focusses on the reverse, collecting everything I create in one place *for other people* to read about me.

Analyzing along this line is getting closer to what I had in mind.

Wednesday, August 02, 2006

YouOS

My first reaction to YouOS: Cool. A webtop that looks the same from wherever I log in.

Second reaction: Hey, we had this in the 80's! Log in to any Sun on the LAN and your NFS shares are all in the same structure, your desktop just as you left it.

Third reaction: Did we invent the web only to repro the 1984 Mac desktop yet again?

Fourth reaction: There isn't hyperlink number one on this desktop. Shouldn't I be able to get the hyperlink of an object and send it to someone so they can see it or get it?

I'm not sure what I'm looking for, but it needs to be more webby. It needs to enable something I can't do now with a desktop. There's some new metaphor we're groping for. I'll know it when I see it.

Sunday, July 30, 2006

Understand the measurements

I'm sort of a cycling nut. The events of the last week, and the last couple of years, inspired me to put down these critical thoughts on anti-doping hysteria:

Bayesian Analysis for Dummies
My training is in geophysics. I have no expertise in biology. But as a geophysicist, I have worked with measurements a lot, and I know how to assess them.

Here's a simple example. Suppose a dangerous disease affects one of every 100,000 people. A lab develops a test that is always positive if you have the disease, but that gives a false positive in one percent of cases. Your test returned positive... do you have the disease?

Well, you might. But chances are, you don't. In fact, your chances of having the disease are about one in a thousand. The “prior probability” of 100,000 to 1 dominates the test result. The test would have to show far fewer false positives to be a useful tool in diagnosing the disease.

Recent accusations against cyclists – notably Floyd Landis, Lance Armstrong, and Tyler Hamilton, have been based on biological measurements. The measurements are valuable and largely trustworthy. But the meanings of all measurements need to be assessed in light of errors and uncertainties surrounding them. A newspaper publishes an article that Landis's testosterone to epitestosterone ratio (T/E) exceed the allowed limits of 4. Let's examine the measurements.

First, go back to my earlier example. Substitute “testosterone abuse” for “deadly disease”, and assume one out of every ten cyclists is an abuser. Is Landis guilty? Probably... but not certainly: His probability of guilt would be less than 92%. So eight times out of a hundred we would be wrong to take away his victory.

But wait: the T/E test doesn't always give a positive result for abusers. Lots of abusers can pass that test. If we think 50% of abusers can scrape past, does that affect Landis's odds? Yep – now he's only 85% likely to have abused. Still want to apply a two year exile from the sport?

Here's the article that details that argument: Inferences about Testosterone Abuse among Athletes. They make this point: “Conclusions about the likelihood of testosterone doping require consideration of three components: specificity and sensitivity of the testing procedure, and the prior probability of use. As regards the T/E ratio, anti-doping officials consider only specificity. The result is a ﬂawed process of inference.” In other words, the WADA procedures assume the test catches all abusers, and don't account for the known prevalence of abuse, so they're wrong.

Landis's Eleven
Now how about those lab results, anyway? So far, we've just accepted the lab's numbers as golden. I have heard a ratio of 11:1 for Landis. But all measurements are uncertain. How certain is that 11:1? We want error bars around that number, 11. Is there some non-zero chance the ratio is 10? 15? Even 4? If you ask WADA, it is just: Eleven. (As if Dick Pound would understand the question, or even hear you out.)

How tall are you? Can you tell me to the 32nd of an inch, or to the millimeter? How about to the nanometer? At some level of granularity, you just can no longer resolve a difference in distance. And how about that yard stick you used to measure? Pretty sure it's accurate to a millimeter? So instrument resolution is one source of uncertainty.

Uncertainty is OK! We just have to know how large the potential errors are.

In the case of the T/E ratio test, there are a lot of systems involved. Gas chromatography is well understood, and there are uncertainty estimates available for the systems they use at LNDD (Laboratoire National de Dépistage du Dopage, the Lab testing Landis's samples). The process is temperature sensitive, so we'd really want to know the uncertainty bounds on the actual temperature program they used. The instrument documentation might give us some idea how to translate temperature variations into variations at the mass spectrometer output. There might also be some pressure control program pushing material through the column; how accurately do we understand the effect of uncertainty in the pressure? The mass spectrometer itself, only a subsystem of the whole, has its own uncertainty analysis.

Below is an example of the mass spectrometer output for a similar experiment, taken from a recent paper on screening for steroids. If these peaks were epitestosterone and testosterone, this would be a picture similar to the analysis of Landis's sample.

I think they get the E/T ratio by calculating the area under each of the two peaks, and dividing one by the other. So first of all, any uncertainty in the temperature and pressure would affect these areas. Secondly, the process has to separate the peaks far enough apart so that the two “hills” don't bleed into one another. Thirdly, somebody has to decide where the hills “start and stop”. See that little bump at 12.30 above? Is it part of hill 2 or not? Judgment call.

So if we know how variations in the temperature and pressure affects the shape of the picture above, and if we can estimate how the uncertainty in the temperature and pressure during this test on Landis's sample, then we'd have some decent error bars on Landis's “Eleven”.

Once you have error bounds on Landis's T/E ratio, you revisit the Bayesian analysis. Any appreciable uncertainty will decrease the likelihood that Landis abused testosterone.

The analysis of Landis's sample won't stop with the T/E ratio test, of course. The next step evidently may be an IRMS analysis for the ratios of two carbon isotopes. As the news emerges, you should ask how the test works, and what are the uncertainties.

Armstrong and Hamilton
And so with the charges against Armstrong made by l'Equipe last year, and with those against Tyler Hamilton in 2004. In Armstrong's case, the uncertainty begins with the chain of custody. Then you have an experimental test, applied to six year old samples. In the Hamilton case, the anti-doping agencies placed full faith and credit in a new test – only published months earlier -- with documented repeatability problems. The actual procedures in executing these tests are more complex and sensitive than the procedures for T/E. Yet, all we get from the lab is, “positive”. These positives are the results of judgments, made in good faith presumably, but not subject to review by the athletes or the public (see e.g. the Vrijman report).

My point isn't that these men are innocent. My point is that the probability of their guilt is far less than the public assumes from news reports. We can pardon the public, and even the press, for putting more faith in the numbers than they deserve. But the bureaucrats at the anti-doping agencies irresponsibly accuse athletes. They pay lip service to “science” but know nothing of it. They make great displays to be the most earnest of witch burners, lest they endanger their jobs for lack of vigor pursuing dopers. Cycling pays the price.

Additional Reading
Can we handle the truth? We are our own worst enemy -- Knowledgeably explores the madness and injustice of the anti-doping regimes, from someone inside the sport.

Saturday, July 29, 2006

Deep Quote

I've wanted a service like this for a long time!

Tuesday, July 25, 2006

We'll never get O/R mapping right

Ted Neward calls O/R mapping a "quagmire" in The Vietnam of Computer Science. He's right (although the Vietnam analogy is a distraction).

He considers approaches to licking the problem, and he almost gets it right with "Integration of relational concepts into the languages". Yeah -- but he limits his consideration to tweaks to mainstream languages (dismissing "fringe" languages like Ruby... not that Ruby's doing it right).

We're not going to get this O/R mapping thing right... ever. It's the "O" in "O/R" that's the problem. We need languages that think in terms of tables or relations. The object languages have us whirling in a Sapir-Whorf spiral. Relations -- tables -- are the language of data. Objects attempt to deal with data but in an ad-hoc way, with no grounding theory that enforces integrity and consistency; you enforce all that in procedural code. Objects may be a handy idiom for writing simulation programs. But the 99% of other programs out there that traffic in data, even transiently, need to be written ground up in a relation oriented language.

Relation Oriented Programming: I've written about it before. I believe the right answer is going to be to construct a relational idiom in the fringe language Common Lisp. I've already begun! Expect results about the time Arc is ready.

Sunday, June 18, 2006

But is it enterprisey class?

Via Alex Russel, who calls attention to Sun's Project Phobos:

The goal of Project Phobos is to show that Java is an excellent platform for server-side scripting, allowing dynamic-language developers to leverage the power of Java SE and EE. The initial focus for Project Phobos is JavaScript, but the design supports the use of other dynamic languages as well....Project Phobos attempts to learn from Rails, but is not limited to the use of any particular programming language and may prove to have a different sweet spot

I checked... the word "enterprise" does appear on that page, but only in the breadcrumbs (" Projects > java-enterprise > enterprise-incubator > phobos).

Rich JavaScript Apps

Two developments are really going to accelerate rich Javascript web apps -- they sure have around here...

1. Dojo is a framework for building js + html + css widgets. A widget ends up looking like this in your html:

<div turboalign="left" dojoType="TurboSplitter"></div>

That is a splitbar widget from TurboAjax, who extend the Dojo framwork, and it really is that simple to put a sliding splitbar on a web page.

2. Google has open sourced Excanvas. This is a one line include that lets you script the <canvas> element in IE. <canvas> is an element available in Firefox, Safari, and Opera, and now standardized by whatwg, that lets you script vector drawing primitives. Excanvas just implements all the Canvas apis in VML. Finally, a unified model for vector drawing!

Tuesday, June 06, 2006

IT Consultants Underestimate the Long Tail. Again.

From Marketplace's story on Google Spreadsheets:

I.T. consultant Gerald Murphy won't recommend it to his clients....

GERALD MURPHY: The only people I think this would have any applicability at all to are very small companies that have extremely limited budgets who wanna pay no money at all.

Um, yeah. Sorta like the way Adwords lets all these very small companies advertise for pennies.

Sunday, May 28, 2006

Getcher Web 2.0 Certification here.

Requirements:

Big fonts
Oversized input fields
Silly or misspelled name
"Beta"
AJAX
Community content
"Something"-sharing
Bright colors and/or pink
Rounded corners
Use of Google maps
Founder has a blog
RSS
Tagging
Creative commons
Wiki
Podcast/video/mobile content

Saturday, May 27, 2006

Peter Seibel Video

Peter Seibel, author of the excellent Practical Common Lisp, gave this hour long presentation at Google, evangelizing Common Lisp. And here are the top three reasons CL rules and Java sort of sucks:

3. He gives a good worked example of using CL generics to replace miles of Java code implementing the visitor pattern. The key is the "double dispatching" in the pattern: You visit each node, calling some polymorphic method; you pass the method a reference to the calling object, which may itself be polymorphic. So the number of cases to dispatch is m * n, where m and n are the number of possible types for each of the two objects involved in the call. In CL, you just create a generic with m * n implementations; in Java, well, you have to touch a lot more code.

2. He contrasts ordinary try/catch exception handling with CL's conditions mechanism. Java unwinds the stack, losing all the state; CL notifies handlers up the stack without yet unwinding the stack, so that a handler up the stack can ask a function lower down to restart.

1. CL Macros let you abstract out syntax paterns.

It's the kind of video you can mostly listen to and only watch occasionally, as you get other work done -- just the way you'd do if you were at the presentation.

Monday, May 01, 2006

80/20 REST at Amazon

The Register has short interviews with Tim O'Reilly and Amazon's Jeff Barr on WS-* vs REST, and unsurprisingly, REST "wins". Barr says they see "20 per cent SOAP, 80 per cent REST."

Thursday, April 27, 2006

The Enterprisey Web Style

Oh this is too funny: SOA integration with Flickr and del.icio.us.

We picked a messaging bus that uses the industry standard WS-HTTP and the emerging WS-Blog protocol.

But you know this is how the consulting crowd are going to sell web style, starting very soon.

Monday, April 24, 2006

GData Optimistic concurrency

Google Data APIs beat Atompub to the punch and stamped legitimacy on REST. I hope Atompub will end up looking as nearly like GData as possible.

Of note: this is a Hi-Rest API, and now, we can safely say, it's the way the web actually works. Not just the way some very dedicated people aspire for the web to work.

So enough rejoicing. Let's begin analyzing GData to death.

First technical question: optimistic concurrency: What's wrong with plain old HTTP if-unmodified-since/if-match? That's how you guard against concurrent updates in HTTP right? Why invent a new protocol for that?

Thursday, April 06, 2006

SOAP+REST: Why?

mnot declares

I’m a little confused by Mark Baker’s stance regarding SOAP; he seems to encourage the Web services world to use SOAP on top of HTTP in a fashion compatible with HTTP.

MarkB responds he's offering a face-saving out for SOAP vendors. Is that the best reason? I can't think of one reason to wrap the messages I traffic in, in a SOAP envelope. What are the use cases for RESTful SOAP, please?

Slightly More Complex List Extensions

Using the MS Simple List Extensions for RSS and Atom, you can really make a "feed" be anything you want it to be. Anything you can put between the angle brackets of a <cf:treatAs> tag, you can represent in a feed. I call it SMCLE (Slightly More Complex List Extensions).

Here is how it will work:

We'll define several new values for <cf:treatAs> element. Currently the only defined value is "list". Now we will have:
<cf:treatAs>Bag</cf:treatAs> : the items form an unordered collection.
<cf:treatAs>Queue</cf:treatAs> : the items form a fifo queue.
<cf:treatAs>Stack</cf:treatAs> : the items form a lifo queue.
<cf:treatAs>CircularQueue</cf:treatAs> : when you get to the end of the collection, start over again at the top.
<cf:treatAs>OneBigString</cf:treatAs> : the items aren't items at all, in the RSS/Atom sense. The content of items really should be concatenated to form one big string.

Additionally, we'll allow extension namespaces so you can define your own "treatAs" values, e.g.:


<cf:treatAs ns="http://my.example.com/smcle">JavaScriptStatements</cf:treatAs>

Now, I know your aggregator is going to suck if it can't process all these variations correctly. Not to worry: the MS Feeds API will handle it all for you! In fact, the MS Feeds API will pretty much become the definition of a feed. If it doesn't work with the API, you don't need it!

Wednesday, April 05, 2006

Simple List Extensions: Invidious?

The Microsoft Simple List Extensions for RSS + Atom seem like a great idea. You declare attributes for your entries -- like Artist, Date Added, Price, Sales Rank -- and then set the attribute values on each entry. It's a tiny little schema language, with a type system. IE7 can then sort and filter the entries based on those attribute values.

The problem is that once you have that capability, you come to rely on it. If your site has a thousand entries, you can serve it up as a single feed with attributes allowing the client to filter. But you'd organize your site differently if clients didn't have that capability. So, for feed readers not supporting SLE, you'd want to break up your site into several feeds, maybe one for each Artist.

I'm sure every feed reader will have to support these extensions. Otherwise they would appear to suck on feeds that rely on SLE.

Thursday, March 30, 2006

IE7 Transforms Atom to RSS 2

I put a little mileage on IE7 feeds today.

First this Wellstorm Atom feed. Of note: a) it validates on Feed validator; b) it uses an extension namespace; c) it specifies a CSS stylesheet.

I've pasted in the relevant files below.

Observations:

IE7 uses its own stylesheet, ignoring mine. That's ok I guess; it's a smart client, so it should be styling to suit its own purposes.

Before you subscribe, View Source: It displays the Atom xml content.

After you subscribe, View Source: IE7 has transformed the Atom to RSS 2.0.

Unfortunately, within IE7 you can never again view the original atom file! You can't enter the URL and View Source... it always shows it transformed to RSS. Weird.

Here is the Atom feed:


<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="http://www.hughw.net:8080/witsml/styles/atom.css" type="text/css"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:wixp="http://www.wixp.org/ns/atom">
  <id>http://www.hughw.net:8080/witsml/atom/Default%20Repository/!well</id>

  <title>WITSML Repository</title>

  <updated>2006-03-17T14:01:14.935-06:00</updated>

  <link rel="self" href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/!well"/>

  <entry>
    <id>http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12</id>
    <title>6507/7-A-42</title>
    <link href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12.rby" rel="http://www.wixp.org/rel/contains"/>
    <updated>2001-05-31T08:15:00.000000+00:00</updated>
    <published>2001-04-30T08:15:00.000000+00:00</published>

    <author><name>John Smith</name></author>
    <wixp:ObjectType>well</wixp:ObjectType>
    <wixp:LastModified>2006-03-17T14:01:14.935-06:00</wixp:LastModified>
    <wixp:Name>6507/7-A-42</wixp:Name>
    <wixp:uidWell>W-12</wixp:uidWell>
    <wixp:nameSource>John Smith</wixp:nameSource>

    <wixp:dTimStamp>2001-05-31T08:15:00.000000+00:00</wixp:dTimStamp>
    <wixp:dTimCreation>2001-04-30T08:15:00.000000+00:00</wixp:dTimCreation>
    <wixp:dTimLastChange>2001-05-31T08:15:00.000000+00:00</wixp:dTimLastChange>
    <wixp:itemState>Plan</wixp:itemState>
    <wixp:comments>These are the comments associated with the Well data object.</wixp:comments>
    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><a href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12">well 6507/7-A-42</a> <a href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12.rby">[contents]</a></div></summary>

    <content src="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12" type="application/x.witsml+xml"/>
  </entry>
</feed>

Here is the IE7 persisted version:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005">
    <channel>
        <guid isPermaLink="false">http://www.hughw.net:8080/witsml/atom/Default%20Repository/!well</guid>
        <title cf:type="text">WITSML Repository</title>
        <pubDate>Fri, 17 Mar 2006 14:01:14 GMT</pubDate>
        <atom:link href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/!well"
            rel="self"/>
        <item>
            <guid isPermaLink="false"
                >http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12</guid>
            <title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"
                >6507/7-A-42</title>
            <atom:link xmlns:atom="http://www.w3.org/2005/Atom"
                href="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12.rby"
                rel="http://www.wixp.org/rel/contains"/>
            <pubDate>Thu, 31 May 2001 08:15:00 GMT</pubDate>
            <atom:published xmlns:atom="http://www.w3.org/2005/Atom">Mon, 30 Apr 2001 08:15:00 GMT</atom:published>
            <author>John Smith</author>
            <atom:author xmlns:atom="http://www.w3.org/2005/Atom">
                <atom:name>John Smith</atom:name>
            </atom:author>
            <wixp:ObjectType xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">well</wixp:ObjectType>
            <wixp:LastModified xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">2006-03-17T14:01:14.935-06:00</wixp:LastModified>
            <wixp:Name xmlns="http://www.w3.org/2005/Atom" xmlns:wixp="http://www.wixp.org/ns/atom"
                >6507/7-A-42</wixp:Name>
            <wixp:uidWell xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">W-12</wixp:uidWell>
            <wixp:nameSource xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">John Smith</wixp:nameSource>
            <wixp:dTimStamp xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">2001-05-31T08:15:00.000000+00:00</wixp:dTimStamp>
            <wixp:dTimCreation xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">2001-04-30T08:15:00.000000+00:00</wixp:dTimCreation>
            <wixp:dTimLastChange xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">2001-05-31T08:15:00.000000+00:00</wixp:dTimLastChange>
            <wixp:itemState xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">Plan</wixp:itemState>
            <wixp:comments xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom">These are the comments associated with the
                Well data object.</wixp:comments>
            <description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"
                ><div><a href="http://www.hughw.net:8080/witsml/atom/Default
                Repository/well.W-12">well 6507/7-A-42</a> <a
                href="http://www.hughw.net:8080/witsml/atom/Default
                Repository/well.W-12.rby">[contents]</a></div></description>
            <content src="http://www.hughw.net:8080/witsml/atom/Default%20Repository/well.W-12"
                type="application/x.witsml+xml" xmlns="http://www.w3.org/2005/Atom"
                xmlns:wixp="http://www.wixp.org/ns/atom"/>
            <cf:id>0</cf:id>
            <cf:read>true</cf:read>
        </item>
    </channel>
</rss>

Monday, March 27, 2006

Lisp for Entrepreneurs

Yet another Lisp is for Entrepreneurs link -- this one from Bill Clementson. Programming languages are converging on Lisp, so I'm bypassing the deep dive on Ruby...

I've gone deep on Lisp recently and I have to tell you: maybe I'm not smart enough to be a Lisp programmer, at least soon enough. I've written a functioning, and reasonably complex, program (a Gibbs sampler for a Bayesian network) but I can hardly read the thing. It's all (car(cdr(assoc('xxx ...)))) ...that design is nobody's fault but my own, but it's just the path of least resistance as I'm writing the thing. I have begun to see a few glimmers of possibilities I don't have in other languages. I've written a macro, and I get it. I've passed lambdas to functions. This code is a a lot terser than a Java version I wrote years ago. Terse is good, right? But I have to comment the code more extensively because it's not really "self describing."

Maybe I'm the sort of weak-natured guy who needs an early-binding language enforcing the discipline up front. 'Cause I have to admit, strong typing makes me write better programs. It's a character flaw.

Wednesday, March 22, 2006

Microcontent Three-Way Processses

We're all going to have to package our services as microcontent (via Sean McGrath). For those of us in the business of delivering "two-way" data services, our webapps will wither. Good!

Now we have to figure out how to get paid. I can deliver some critical information to you via Atom or RSS. And you have some client subscribing, consuming, and processing that data on your behalf. I need to get paid by subscription, or by the piece. I'll no longer have a webapp where you subscribe or agree to purchase single items, and where I meter your consumption. Aggregators, and successor services like them, will have to do that. So we need extensions to APP to compose three-way transactions.

So you will initiate a subscription via your aggregating service. I'm not neccessarily talking about a feed aggregator; this might be some specialized service in a vertical, like stock quotes. It will collect your personal and payment information. It will contact me to subscribe you. It will send me your PayPal, or whatever, payment authorization. I will respond with approval or denial. When retrieving information on your behalf, your aggregator definitely has to tell me you are retrieving that information, so I can authorize it. Contrast to current practice: aggregators retrieve a feed on behalf of hundreds of anonymous users. And we have to address the issue of whether I trust the aggregator service to tell me the truth, or whether we can design protocols preventing it paying for one subscription when it has sold 20.

To make that happen, we'll need a microcontent business process specification that can compose three-way transactions. And it has to be as simple and transparent as RSS and Atom. I'm not talking about BPSS, although there's a lot to be learned there.

Spooky Rojo Mojo

Alerted by Mark's post that he switched to Rojo, I just now signed up to give it a try. After going through the wizard, I land on my new home feeds page. And my very first "Rojo Feed Recommendation" is: Mark's other blog, Integrate This. And no, I haven't imported my feeds from Bloglines yet!

Thursday, March 16, 2006

URIs are opaque, except when they're not

Roy jumps in to clarify the law about URI opaqueness:

The key is that the server is free to tell the client that there does exist structure in a given URI-space, and then the client is free to make use of that knowledge in future requests. That is how server-side imagemaps worked -- HTML says that the URI is structured such that appending "?X,Y" to that URI, where X and Y are non-negative integers, corresponds to points on a map that can respond to future GET requests.

Thus, one way for the server to tell the client that a given URI is structured is to provide the URI in a standard element of a standard media type that has been defined as such. Another is to include the URI in a response header field.

I may be having my own, belated moment of total, violent, zen clarity.

Saturday, March 04, 2006

Patterns Problem

Peter Seibel's book has a nice analysis of the "patterns problem":

The reason both these functions start the same way is because they're both test functions. The duplication arises because, at the moment, test function is only half an abstraction. The abstraction exists in your mind, but in the code there's no way to express "this is a test function" other than to write code that follows a particular pattern.

Unfortunately, partial abstractions are a crummy tool for building software. Because a half abstraction is expressed in code by a manifestation of the pattern, you're guaranteed to have massive code duplication with all the normal bad consequences that implies for maintainability. More subtly, because the abstraction exists only in the minds of programmers, there's no mechanism to make sure different programmers (or even the same programmer working at different times) actually understand the abstraction the same way. To make a complete abstraction, you need a way to express "this is a test function" and have all the code required by the pattern be generated for you. In other words, you need a macro.

Lisp, again.

Wednesday, February 22, 2006

IE7 Quick Take

Curious about IE7, I installed the beta preview.

The browser itself looks OK. Basically it's Firefox look and feel. Tabs. Wow. I'm interested in exploring the RSS/Atom support but did not get the chance to exercise much. They do have a little "subscribe" icon that looks just like Firefox's. It makes you wonder: If it takes the equivalent of a space program to catch up to dinky little Firefox and feed aggregators, how is MS going to keep up with whatever comes next? They'll be starting out a nose behind. The little guys will blazing new continents before MS ships Vista.

About ten hours later, I had to do System Restore to get back to IE6. Principally, that's because I was working on a project using somewhat unfamiliar tools ( Nullsoft Installer and VS 2003) and needing the online help a lot. After installing IE7, almost no pages were visible in the Nullsfoft CHM, and all the pages lost their styling in VS online help.

Demonstrating the tightly coupled Microsoft world: Install IE7 and you'll be "upgrading" several other products that used to work just fine.

Wednesday, February 15, 2006

Nokia, Ericsson gonna slap you upside the head

Fear, uncertainty, and doubt being spread by Iron Mountain to get webheads to enroll in their seminar:

In May 2006 the new global .MOBI top-level domain extension is expected to arrive. It’s not the usual "yada-yada" new domain extension. It’s backed by major players in the mobile world, like Ericsson, GSM Association, Hutchison, Microsoft, Nokia, Samsung Electronics, Syniverse Technologies, Telefonica Moviles, TIM, T-Mobile and Vodafone. These investors are serious about bringing a much better and consistent mobile browsing experience to all users no matter what device they may be using. They plan on enforcing certain web site formatting requirements. Failure to comply may mean you are shut out reaching mobile users, even if you meet all legal requirements for registering a .MOBI domain name.

Tuesday, February 14, 2006

Oracle acquires Sleepycat

I lost a whole day today because a Berkeley DB file hit 2^31-1 bytes, jamming our subversion repository. That's when I discovered svn has migrated to using its own fsfs filesystem (we skipped two svn revs).

Then at the end of the day we get the news: Oracle acquires Sleepycat, maintainer of BDB, as part of its continuing rollup of open source (InnoDB, Sleepycat) and small private (Times Ten) database providers. I'm no fan of Sleepycat, and I admire Oracle DB; but this is giving me the creeps.

Dear Hugh,

I'm pleased to announce today that Sleepycat Software has been acquired by Oracle.

By joining the leading database company in the world, I expect that we will be able to serve our customers and the open source community better. With the additional expertise, resources and reach of Oracle, we'll be able to accelerate innovation, offer you greater choice, and provide more complete solutions. For Oracle, we fill a gap in the product portfolio for high performance embedded/edge databases, an area which we believe is a significant and growing opportunity....We look forward to working with you as part of Oracle!
....

Regards,
Mike Olson
Vice President, Oracle
Former President and CEO
Sleepycat Software

Nice job change, Mike.

Sunday, February 12, 2006

We can hack this too

Via Kim Cameron: Companies are injecting employees with RFID chips.

My pet theory is that efforts to use technology to nail down identities are the more hackable, the more confidence authorities place in them. It's a corollary to Edward Luttwak's thesis in Strategy: The Logic of War and Peace, summarized in one review as

The crucial question to be asked of any new tactic, strategy, or technology is not: “how will this affect battle?” but “how will the enemy react?”

Wednesday, February 08, 2006

Patterns: Another Way To Say "Bloat"

A quote I'd missed at the very end of this oft-linked Paul Graham essay:

...in the OO world you hear a good deal about "patterns". I wonder if these patterns are not sometimes evidence of case (c), the human compiler, at work. When I see patterns in my programs, I consider it a sign of trouble.

It's elementary information theory, sir. Any pattern, or predictable part, of the signal is non-informative bit bloat. Compression programs work by identifying predictable patterns and encoding them simply: if the pattern recurs, you don't have to repeat it verbatim. You can just say, e.g. "Pattern 3 again" to substitute for the longer sequence.

It's the same with programming. Coding patterns is cut and paste. Eclipse will even help you do it, in Java; it has all sorts of nifty shortcut keys to do patterns like "return an array from a collection instance." That's reverse compression: You type Ctrl-Shift-J or whatever, and Eclipse expands it into the pattern for you.

Here's a pattern that gripes me in java: for data access, I usually make two methods for each thing I want to access. Both methods accept all the parameters you need to select the thing; but one method accepts a JDBC connection parameter, while the other constructs and destroys the connection and calls into the first method. The second method is a convenience procedure, while the first one allows you to make the call as part of a series of actions withing the same transaction. You know...


   public static long getMetainfoId(UID uid, String objectType){
       try {
           // notice the pattern within a pattern below (service locator)
           Connection conn = ServiceLocator.getInstance().getDataSource().getConnection();

           try {
               return getMetainfoId(conn, uid, objectType);
           } finally{
               conn.close();
           }

       } catch (Exception e) {
           throw new RuntimeException(e);
       } finally {

       }

   }

   public static long getMetainfoId(Connection conn, UID uid, String objectType){
      .... do the real work
   }

I have cut and paste that snippet about five hundred times.

I want a way to avoid both compile time and run time bloat. I don't just want a magic macro that saves source code; I want the actual wrapper code to be generic. I asked my friend and colleague Brad to sketch a Lisp solution:

the wrapper function would be something like this:

(defun wrapper (resource-creator resource-usinge-function &rest args)
 "resource creator is a function that creates the resource we will
 temporarily use. We can also parameterize it if necessary at compile
 or run time using lambdas depending upon needs"
 (let ((resource (funcall resource-creator)))
   (unwind-protect
     (apply resource-using-function resource args)
     (resource-close resource)))) ; resource close could be a generic
function or parameterized as well


when defining a wrappapble function:

(define-wrapped get-uid (connection metainfold)
 (create-connection p1 p1) ; this line specifies the resource creation function
 (your code here)) ; the body that does the work and gets wrapped

I'm going to learn Lisp!

Monday, January 23, 2006

Java Infrastructure

Bill de hÓra: JORAM in particular is a quality JMS implementation.
Yup. Wellstorm uses JORAM internally and it has proven solid. (The ObjectWeb logger did hijack our logging, an issue we still need to work out). I really haven't caught any JORAM errors, so it meets the ultimate test of infrastructure: It's just there.

If you can't say anything else for Java, you can say it has a great choice among infrastructure packages. Last weekend I added a scripting capability using Rhino. Got the idea at lunch Friday. Took an hour to get something working, the rest of the time framing a decent web UI. Demoed it today.

Encoding the XML infoset in HTML forms

Every month or two I find myself remarking on the awkwardness of constructing XML on the client side. Are HTML clients forever condemned to using XMLHttpRequest? Must we await widespread support for Xforms?

In the spirit of The Simplest Thing That Could Possibly Work: Why don't we just standardize encoding the XML infoset in HTML forms?

Here's how you could POST an Atom entry (compare to example in draft 07):


<form action="http://localhost/foo" method="POST"> 
<input type="hidden" name="/entry/@xmlns" value="http://www.w3.org/2005/Atom"/>
Title:   <input type="text" name="/entry/title">
Link:    <input type="text" name="/entry/link/@href"/>
Id:      <input type="text" name="/entry/id"/>
Updated: <input type="text" name="/entry/updated">
Summary: <input type="text" name="/entry/summary">
<input type="submit" name="submit" value="Submit">
</form>

That form encodes as


%2Fentry%2F%40xmlns=http%3A%2F%2Fwww.w3.org%2F2005%2FAtom&%2Fentry%2Ftitle=Atom-Powered+Robots+Run+Amok&%2Fentry%2Flink%2F%40href=http%3A%2F%2Fexample.org%2F2003%2F12%2F13%2Fatom03&%2Fentry%2Fid=urn%3Auuid%3A1225c695-cfb8-4ebb-aaaa-80da344efa6a&%2Fentry%2Fupdated=2003-12-13T18%3A30%3A02Z&%2Fentry%2Fsummary=Some+text.&submit=Submit

The pattern is: Name your HTML form elements using XPath syntax.

(What if all Atom powered blogs supported POSTing of form data in addition to the Atom syndication format? I guess I could construct a website that exposes a form like this, and permits you to direct your entry to any blog service. Not sure how useful that would be, but it doesn't seem any less useful than enabling a rich client do it ;) ).

I'm just reiterating the point I made here some time ago: form data is a perfectly good hypermedia representation, and it has a ton of support already in place.