Hugh Winkler holding forth on computing and the Web

Monday, December 26, 2005

Java SWT rich clients: Azureus, Oxygen

Azureus Bittorrent client is a pretty impressive example of a Java rich client based on SWT. It's responsive and rich graphically. Did they finally get the equivalent of Java2D in SWT? Azureus has lots of scrolling realtime graphs to diagnose why your 2GB download goes so slowly. .NET isn't your only choice for a rich client.

I should also mention Oxygen, an SWT desktop XML IDE also based on SWT. It's capable and snappy, and has a great XSLT debugger; but not as feature loaded as XML Spy.

A nice scrolling real time graph from Azureus

Friday, December 23, 2005

Oldest Mozilla Bug (Not)

Every software organization has its oldest bug. The older the bugs are, the less likely you are to fix them. If I remember correctly, at my last company , after three years the oldest bug we never had fixed was number 11. Usually a bug becomes irrelevant after a number of years.

Three years ago today I reported a Mozilla CSS rendering bug. The details won't interest you, unless you're really into using CSS to replace tables. The workaround to the bug is: Use a table. Surely it's not the oldest bug in Mozilla, but it may be up there. And web developers are still actively encountering it.

Most interesting is the dynamic of getting a bug fixed in Mozilla. I reported the bug three years ago today, but it was a duplicate: A guy originally reported it on December 28, 2000. Today I thought it worthwhile to commemorate my third anniversary, and the approaching fifth anniversary, of Bug 63895:
I am celebrating today my third anniversary as a reporter of this bug. I reported it Dec 23, 2002 ( and was soon notified that it was a duplicate of this one.

In just a few days the bug will pass its fifth anniversary since originally reported by Stephen Clouse on Dec. 28, 2000, near the end of the last century.

I've subscribed to the "progress" of the bug over the years since then. Here are my picks for each year's highlights:

o 2001: A comment by Hixie ( I didn't understand this one really ( I don't understand any of the technical comments really) but it seems really cool that Hixie has been on the case.

o 2002: Derek ( "This seems like a pretty significant bug, considering that CSS2 positioning is supposed to end dependence on table-based layouts."

o 2003: Boris ( complaining about "pushy bug reporters who demand things as their right without thinking about the fact that ..." yada yada.

o 2004: Joe ( "Wow... 12-28-2000... Don't hold my breath eh?", followed by Martin's riposte ( "Please keep those remarks to yourself, this doesn't help."

o 2005: ATom ( "How are advantages of this behavior? According my opinion is it only disadvantage. How much authors use this behavior? How many pages can change of this behavior cause regression?"

I'm looking forward to lots more analysis and opinion on this bug in 2006!

Friday, November 25, 2005

Common sense

RESTful design is not an end in itself: If you find yourself adding complexity to honor the constraints of REST, consider the benefit of the complexity and only do it if the benefit is clear. Here's a case of a guy asking this perfectly ordinary question: How can I use POST to execute a query where the query string is too long to stuff into an URL? This question pops up once a year on the rest-discuss list.

Some responded with helpful suggestions to have the POST create a temporary resource, which when you GET it, returns the query results. Mark doesn't think this is restful. But I do, because the client and server understand that the semantics of whatever document told the client the URL to POST the query to, demand that the POSTed doc be a query document, and that the Location header in the response be the URI where the client can retrieve query results. It's all in the mime type of the document containing that link.

But that's neither here nor there. It's wrong to create this intermediate resource if you simply want to get the query results as if your URL were short enough to do a GET. It doesn't enhance interoperability, and it adds complexity to the server design, which now has to manage the lifecycle of these temporary resources. I'm not saying returning 303 is a bad thing; just that in the case where you really just want to emulate GET and workaround an artificial limitation, just POST the query and return the result in the entity body. This is perfectly RESTful, but that's beside the point: Keeping it simple is always a higher value than perfect RESTful design, as long as you don't degrade interoperability.

Thursday, September 29, 2005

Free vs Safe in semweb

Ian Davis is having an RDF breakdown. Seems Dublin Core can't seem to get dc:creator quite right:

That's when my crisis struck. I was sitting at the world's foremost metadata conference in a room full of people who cared deeply about the quality of metadata and we were discussing scraping data from descriptions! Scraping metadata from Dublin Core! I had to go check the dictionary entry for oxymoron just in case that sentence was there! If professional cataloguers are having these kinds of problems with RDF then we are fucked....A simpler RDF could take a lot of this pain away and hit a sweet spot of simplicity versus expressivity

In the free vs safe debate, looks like he's making a run at freedom.

Monday, September 26, 2005

Technorati. Sigh.

If I do a Technorati search for some URL (say, this blog), I get the Technorati search page. In the upper right I notice an image "Add to watch list". That's right, if I click the link, the browser will do an HTTP GET on URL, and change my watch list. I guess it must be a good thing that the above URL modifies your watchlist, not mine, if you click it. Although a lot of people would call it a bad thing for your identity cookie to make an URL identify one resource for me, another resource for you. I guess if you're going to introduce side effects for GET, you might as well fix it by making the URI identify multiple resources.

Update: I could be off base with the side effects argument. The side effect of clicking the link N times really is the same as clicking it once, which is all RFC 2616 asks. Something in me reacted to having clicking a link change some state, but I guess I shouldn't get my panties in a bunch about it. Still, the URL identifies your watch list for you, and my watch list for me. That's wrong, isn't it?

Saturday, September 24, 2005

Search is the new Forms

I'm late to the Atom web services party. Why didn't someone tell me the Atom Publishing Protocol covers all of the territory I've been discussing for RESTful WITSML web services?

Now OpenSearch promises that I can expose a search URL over my Atom service that can be used by search aggregators to do sort of intelligent searching.

If anybody can search my web service using a standard protocol, to discover URLs of resources they are interested in, it's the poor man's equivalent of having a forms language. Aren't 90% of the forms you fill out on the web some form of search?

I can force a lot of the semantics of my web service into search. What can't you model as a search? You can model UDDI as search. You can model any web catalog as a search. Heck, you can model solving a differential equation as a search.

Search may substitute for a really articulate, unconstrained forms language.

Without a forms language, REST web services are little more useful than RPC style web services. That's because the guy programming the service client has to understand, at design time, the semantics of each URL. Example: You learn the algorithm for constructing URLs and write your program to build them given some parameters you collect from a user. It's the same idea as calling a remote procedure. Other "REST" services might just supply you with a menu of URLs, whether they honor GET or POST, and the media types you can send or receive. Again, you're doing it at design time.

RPC services force clients to understand them at design time. You have to read some documentation and construct your program so that it calls functions in some order that makes sense to that service.

REST services use "hypermedia as the engine of application state." One realization of that idiom is HTML forms. Forms are how the service bypasses the browser. The guy who wrote the browser does not understand what is in the form. But he knows it is an HTML form and he has the browser render it for you to complete. The form tells the browser how to serialize the fields you complete and POST them to the service. It is HTML forms that enable you to order a plane ticket, or a book, using the same piece of compiled software: the browser. The form is a little program the browser downloads and executes at run time. The result of executing the program is a string, or a multipart message, the browser can submit to the service to obtain some other resource representation -- which, like all the other HTML it traffics in, the browser does not "understand".

So how can a web service enable the same dynamic capability for machine, as opposed to human, agents? Here's the use case: You're dropping your own service into a brew of services running in some environment. And your service needs the results of other services as input to its own. It needs to locate those services in the brew. And it needs to invoke those services correctly and interpret the results. At design time, you don't understand how any of the other services work, or which services will be available; but you do understand the documents they traffic.

If you had a really intelligent automaton on the client side, it could retrieve a form document from any service telling it what parameters to retrieve and how to serialize them. But I'm pretty sure we're not going to have the intelligent automatons I outlined in a previous blue sky piece.

Instead, you have the capability to search. It's a lot like completing a form. It's more constrained than that, though. It's the kind of form that can only do one thing, for all applications.

So you're programming the travel reservations application. Your app can search a directory for the airline, auto rental, and hotel reservation services. It searches the airline service for flights from Austin to Atlanta leaving Monday, returning Wednesday. It searches for mid-size rental cars available in Atlanta. It searches for hotels in downtown Atlanta in a certain price range. Because we've standardized search, you program each of these interactions using the same model.

To complete the transaction, you would use the APP to create a purchase order document with the service.

It would be cool if services could annotate the search terms with RDF properties. OpenSearch doesn't try to get that sophisticated, and good for it. But to complete an airline reservation you're going to need to know how to search for "flight", and not have the search return some other object. You could draw the search terms from an airline ontology.

But if I suggest SPARQL as an alternative to OpenSearch, Bosworth and the free vs safe libertines will jump my shit. Maybe rightly so. I'm still re-educating.

Friday, September 23, 2005

Are data models passé?

In the free vs safe debate, free is winning. That's a debate not limited to programming languages. You see the same meme in the web services debates. Google does "free" for data. Adam Bosworth's pitching open, "dumb" search standards; so is Joe. Do we need formal logical data models?

We won't be able to impose them. No data architect will design a master schema or ontology over domains like, say, process control, or auctions. Instead, mediators like Google will infer models from content. Or each of us will contribute our bit to the global model by social bookmarking and tagging. The excitement in searches will be in augmenting the raw search results served up by the dumb search protocols, with the value a mediator like Google or adds in imposing their inferred models, making the searches faster or more accurate.

Monday, August 29, 2005

World's most useful blog

It's sobering to realize how black is the information hole around New Orleans right now. Hurricane Katrina. Cell towers down and powerless. Land lines inoperative. We have isolated little spotlights from CNN and Fox News, but they sent their teams to some downtown hotels, same as any corporate traveler, so these reports have focused on downtown. It's a lot like reporting from Baghdad hotels about the invasion of Iraq. N.O. is huge and dense. So check out the Times-Picayune's blog for the real dope. They deserve a Pulitzer for getting the information out to those of us really needing it. They're assembling reports there from all over the city. I'm a New Orleans ex-patriate, and right now I'm awaiting here in Austin the arrival of some family refugees who made it out, and are headed to live with us for... who knows how long. The power's going to be off for a month, and you can't get a drink of water. On that blog, and nowhere else really, I've been able to get some glimmer of info about my relatives' and friends' neighborhoods.

Friday, August 19, 2005

I Need a New Language: Rel?

I'm working through a Ruby experiment, sans Rails, and it's going well. I've also lately done a little Python CGI, and I've been burning bucks at Amazon buying the Lisp and Ruby books.

But, ho hum. Oh yeah, dynamic languages are great, you can modify classes at runtime, yeah, yeah. Blah blah closures blah blah.

People, we need to get past the object oriented paradigm, or model, or whatever it is. I'm groping for the one true programming model. Short of that, I'm hoping for one where I never have to do the stupid object-relational mapping again. Hibernate's popularity is emblematic of our decline. If you've got a ton of O/R mapping in your program, you've got a ton of dead code that does nothing for you. Every web app on the planet does this:
  • Make some queries to the db;
  • Build some objects possibly having object references;
  • Iterate over the objects and flatten to tabular HTML.
Um, the data were tables to begin with, right? And tables (relations, rather) , unlike objects, are a logically coherent model for data.

It's a failure of the languages we use, that we need application code to map the tables in which we store data, to some in-memory constructs called objects. And it's a failure of the architectures we use, that we end up flattening those objects into HTML representations.

I want a language for table programming. I think you can write programs in this language that do everything we expect of an application programming language -- building GUIs, reacting to mouse events, listening to sockets -- everything. Don't model your domain as objects. Model it as relations.

We need to explore how we can construct soup to nuts programs using the language of data. I haven't yet looked at Rel. But I wanted it to exist -- I wanted some implementation of Tutorial D to exist. Using that as a starting point, I can imagine programs that have a relvar (Date's term for a relational variable: essentially a table or a view) for MouseState. Whenever the mouse moves, the system adds a new row to MouseState. Yes, a lot of rows! We don't really need to persist them all. When a row gets added to a table or a view the system should invoke some code -- a trigger. The trigger code modifies other relvars in the system. And from that trigger code other events flow.

A programming language built around locating the right bit of code to invoke in response to some state transition -- that would be a useful evolution of OO polymorphism, which locates the right code to invoke based only on the shape of an object. Let's get off of this rutted OO cartpath. We're not inventing anything new here.

Thursday, August 04, 2005

Spam Silver Lining

When I'm waiting for that really important email, and nothing appears in my inbox for hours, and I suspect Norton AV has again croaked and is blocking, I check my spam folder for recent messages, to verify that the mail is still getting through. It's a constant, incessant ping!

Friday, July 29, 2005

SPARQL Last Call

W3C have issued the last call for SPARQL comments. They feel compelled to tell us that you pronounce SPARQL "sparkle". It is not too late to request a change in pronunciation to "sparquel" before it goes final.
I just don't think I can convert to "sparkle" now. Nevertheless, I used to think I could never convert to "lih-nix" from "lye-nix", but I've been overwhelmed by society (and by Linus).

Saturday, July 09, 2005

Reliable POST

A couple of proposals for lightweight reliable POST are circulating: Mark Nottingham's Post Once Exactly (POE) and Paul Prescod's Reliable Delivery in HTTP. Similar in spirit, both techniques propose that servers generate one-off URLs for clients to POST to. Generally the pattern is

-> GET url
<- entity and/or header containing one-off URL
-> POST one-off-url

The one-off URL behaves specially: it only changes application state the first time you POST to it.

The two proposals differ in how the server responds to multiple POSTs. Paul proposes the server simply return the same response it returned when it processed the first request
The response of subsequent POSTs should be the same as if there had been only one POST so that the client can get the correct response even if there is a network outage in the middle of the first response.

Mark proposes that under POE the server return 405 Method Not Allowed on the second and subsequent POSTs to a POE resource:
If the server had received and accepted the first request, it will respond with
S: 405 Method Not Allowed HTTP/1.1
Allow: GET
If the response status is "405 Method Not Allowed" the client can infer that the earlier POST succeeded. A 2xx response indicates that earlier POST did not succeed, but that this one has. When the client receives either of these responses, it knows that the request has been accepted, and it can stop retrying.

Under Paul's proposal if the first POST failed (e.g. 401 Unauthorized), then even if the user corrected the problem, POSTing the corrected form to the resource would still return the same error status. POE's approach really permits exactly one POST to succeed. POE does seem to impose some new semantic over HTTP: If a client receives a 405 it can stop retrying. In practice clients, naive ones not underststanding POE, would never retry after receiving a 405 anyway. It is a new feature on the HTTP landscape that a resource could return success or various failure status codes, until at some point it changes state and complains that POST is not allowed. Nothing wrong with that, and it won't break any clients. But it has not been common to see that behavior.

What about those special POE headers? The proposal acknowledges they're unneccessary -- why use them? If your web site has a form page that POSTs to a POE resource, you can now put text next to the submit button saying "Press repeatedly!" The problem there is that if your browser displays a 405 error, you would not understand that your POST had succeeded -- unless the server also returned a comforting HTML entity telling you that.

It's a minor problem that POE overloads the semantics of 405, becuase it's not a failure really. If you are a POE-aware agent, then the special POE headers tell you to interpret the 405 slightly differently: The operation really did succeed! But it succeeded before this latest POST. I'd prefer that the superfluous operation return a success code.

Maybe we need a synthesis of the two approaches. Paul's instincts were right: A second POST to a resource that had earlier successfully processed a POST, should return the same 2XX or 3XX code, and the same entity, as the first one. After all, does a client really need to understand that this second POST was superfluous? If so, POE proposes that a GET to the POE resource return the created entity, if any, and we should retain that behavior. This way an HTML page could have a simple hyperlink to the POE URL, that would return that entity indicating "transaction succeeded". POE does not say what the server should return in response to a GET if the POST has not yet been processed. A 404 Not found would not be helpful. I suppose it has to be a 200 OK with an explanatory entity: an HTML page saying "Still waiting..." or some such.

If the initial POST to the resource fails with 4XX or 5XX, the server ought to continue to accept POST attempts until finally one of them reults in a 2XX or 3XX success. The semantic we want is that POST succeed exactly once.

[Security concern about one-off URLs: servers must prevent malefactors from predicting one-off URLs and hijacking them. It's probably good enough to generate very long random numbers as part of the URL.]

Tuesday, June 28, 2005

Tagging is easy. Semweb is hard.

Clay Shirky's Ontology is Overrated post is a great survey of why tagging is hot, semweb is not.

But there is a bit of strawman in the argument. Clay sets up ontologies to be hierarchcial organizations of concepts having firm inside/outside boundaries, and easily shows how inadequate that scheme is to describe the web.

But the semweb people actually designed in many of the features Clay likes about tagging. Anyone can make a new OWL ontology describing resources in an idiosyncratic way. OWL concepts overlap: a resource can belong to thousands of concepts. And most definitely, the link topologies are not restricted to hierarchies: an RDF graph looks just like the web.

Tagging is not taking off becuase it describes the web better than semweb. Tagging is taking off because it's easy, and semweb is hard. wouldn't have gotten very far if you had to define your own classes and properties.

But everything you do in, could be done using semweb. A tag is an OWL class with a name and that's about it: No other properties, but the members of the class (your tagged URIs) imply something about the class. You could take each user's tags and create his own personal ontology. No need to adopt anyone else's ontology. But to expand on Clay's "mind reading" analogy: If I determine that another user says "movies" to mean the same thing I do when I say "cinema", I could make that mapping through an OWL equivalence. Then all that user's "movies" tags become trusted indicators for movies in my searches.

Clay is on the money with the observation that the real meanings of the terms are emerging from the statistical fog. You can make observations about the relations between tags based on the URLs that share them. "This URL probably represents a movie, as you think of 'movie'" is the function I need.

Monday, June 27, 2005

Service specific operations and machines

Service independent operations are valuable when the agent invoking them is the kind of agent that talks to lots of different services: a web browser + a human to make sense of what he browses and make choices accordingly.

If the agent is service specific, then heck, just design in all the specific operations you want.

Machine to machine conversations are almost always service specific. You have to program the client to understand how to proceed through the legal application states. The guy programming the client needs to know... First you do this to get this result, then you use that result to make a second query, and so on. It's no help to have a service independent operation set if you're using it that way.

Service independent operations would be valuable in the machine to machine case if you could invent a surrogate for the human: an intelligent machine agent able to make choices, given some output from the last operation. I've said that before, I know. Just thought it would be useful to state it a little differently.

(Caveat: written after a 16 hour day constructing SOAP services for machine to machine cases).

Tuesday, June 21, 2005

WSDL 2.0

Dave Orchard's remarks on WSDL 2 encourage me to replace my own cooked up SDL with WSDL 2. Sounds a little daunting though: "WSD WG decided that the specs were for toolkit authors not wsdl document authors."

I will report experimental results here.

RESTful Web Service Descriptions

I've fleshed out some of the details I left dangling in my previous post about WITSML. The WITSML stuff is so specialized, I've created a separate blog for it.

But if you are interested in the web description language (web-http-desc) discussion there is some meat here for you as well. This draft documents RESTful access to WITSML services, and writing it was a great exercise for the description discussion. (I didn't write it as an exercise; our product implements some of that protocol right now; but I had never written it down in one place).

WITSML is "Well Information Transfer Standard" markup language. It defines not only document formats, but a SOAP API that has the following operations:


Look familiar? Well GetFromStore is a query, and has a query string parameter, but generally, you can do 90% of WITSML just by GET, PUT, and DELETE on objects, and the other 10% are vanilla things you could do with POST.

If you are interested in the web description language discussion, I invite you to have a look at the WITSML+REST draft and comment here or on the list.

Wednesday, May 18, 2005

Messages not Models

Visiting the WITSML SIG meeting and expo last week, sponsored by POSC, recalled the evolution oil and gas E&P IT systems are undergoing. Unlike lots of other industries, the E&P guys have tried in the past to normalize on an industry logical data model. One standards effort was promulgated by POSC, the Epicentre data model -- it is enormous, beautiful, and unused. PPDM is a separate standards effort. Landmark and Schlumberger each offer proprietary models OpenWorks and GeoFrame. Yet each company using one of these models populates the database assigning slightly different semantics to properties, and many have tweaked the models to suit their own practices. So interoperability has suffered.

Enter WITSML. Oh, it doesn't (yet) attempt to address the full range of E&P information -- just a small segment concerned with drilling operations. But it is the very first message exchange system for this industry -- not just an interchange file format, but a message format together with a protocol.

Now that we agree on the messages, who cares about the models? You keep your model, I'll keep mine, and we agree we'll exchange resource representations, not models. It's a great use case for REST.

An E&P application can sit on top of whatever model it prefers and expose its services to the world using the REST prescription:

1. Identification of resources: assign a URI to resources like well, wellbore, etc.

2. Self-describing messages: Accept and deliver XML descriptions of well, wellbore, etc. The representations do have to map to components of the underlying logical data model.

3. Hypertext as the engine of application state: This is the key. In the past, the trouble has been caused by applications maintaining state in the database. A client has to understand the model and has to keep up with the evolution of that model. Using hypertext as the engine of application state, the server in effect programs the client to proceed through the states of the application correctly. As I've pointed out elsewhere, whether the client is a human operating a browser, a human operating a rich client, or an automaton exercising a web service, the server delivering instructions (a "form") to the client effectively is programming that client: "Here are the next steps you can take from here: a) do a GET on this URL or b) serialize these parameters and do a POST to this URL".

So WITSML could be the beginnings of a pattern for extracting the industry from slavery to models.

Oh, except for that silly SOAP API they've crufted together. What I've outlined here is only the possibility enabled by having defined the messages. But because of the RPC style of the WITSML SOAP exchanges, resources aren't identified by URIs, and hypertext most definitely is not the engine of application state. These problems are solvable. More on that later.

Friday, April 01, 2005

Origins of the WS stack

Mark is right on the money:

Web services were created because it was felt that Web architecture wasn't sufficient to integrate disparate applications together over the Internet. Actually, that's not quite right. The explanation that seems to better reflect reality is that the Web was never considered as a platform suitable for meeting the objectives of Web services, as can be demonstrated by the numerous articles talking about how Web services evolved from the likes of CORBA, DCOM, RMI, etc.., without mentioning the Web!! The Web just didn't resemble what folks knew a distributed computing solution to look like, so it just never registered in the heads to consider it.

The guys who built SOAP, meaning Don Box principally, were DCOM guys solving problems in distributed objects. Mark's post prompted me to revisit the old DCOM list. Here's a great post from Don in 1998 reasoning that XML's "self-describing" nature would address the type problems DCOM had. Ah, yes. ITypeInfo was an interface you could expose on an object describing its methods and properties. I think it was hard to use it sensibly for multiple interfaces on an object. Since it was parameterized by a type library description that was installed on the client computer, I guess it was also fragile if you changed the interfaces of the server object. That all seems... so... far... away now.

A parallel trend was to overload port 80 and tunnel DCOM through it, since port 80 was perceived to be "open" everywhere. That was a nifty trick that wasn't very useful, since you had to persuade your firewall to permit garbage to come over port 80 (by tricking it into believing the garbage was SSL).

All we really wanted was DCOM over the Internet.

Those two trends, self describing RPC payloads and leveraging the web, came together as SOAP.

Friday, March 25, 2005

Good web services make webapps programmable

Just as exposing COM automation interfaces on desktop applications made Excel and Visio more powerful by enabling people to write a Visual Basic script to build composite applications from the two, exposing your webapp as a web service enables people to integrate your webapp with others.

Recipe for a good web service: Start with a good webapp like Amazon or Google or Travelocity. Expose the functionality in the webapp as POSTing and GETing self descriptive messages. That's about it. Now anyone can script your application and coordinate reserving an airline ticket with ordering a book.

Saturday, March 12, 2005

Link resolvers

PURL -- persistent URL -- has been around a while, but I only just discovered I could make one myself: That URL will forever redirect to, they think. If I move the blog, I have to update the pointer.

(TinyURL is another useful service for link resolution. Its purpose in life is to shorten long URLS to a manageable length you can actually type; if you move your resource you can't update the tiny URL to point to the new location. gets you here too.)

Why don't we embed a unique identifier into each web page or searchable resource? As I have pointed out (also here), a search engine will always turn up the moved resource, eventually. Couldn't we use META tags to do that? Embed this in your file:

<META scheme="UUID" name="identifier" content="cce89bf0-92e6-11d9-9669-0800200c9a66">

Then a persistent link to the resource might be

Monday, March 07, 2005

NetKernel: RESTful Application server

I have discovered NetKernel at a very late date and I played with it about half a day last weekend. These guys have taken uniform semantics to a new level. Every service, local or remote, is represented by a URI, and you compose services using a declarative * scripting language. You invoke verbs like SOURCE and SINK (analogous to GET and POST) on these URIs, and the kernel understands dependencies and can cache results intelligently.

You create your own services and map them to addresses in the URI namespace.

They also put a lot of emphasis on their available XML pipeline service, and if used properly, there's your engine of application state.

I hope this product can get some attention from REST architects. It's really a unique, REST friendly way to build services.

*My first brush with that language leaves two impressions: a) looks kind of procedural to me, and b) XML is really too verbose to use as a framework for scripting language.

Thursday, March 03, 2005

MDA is so February

I'm riding the crest of the Ontology Driven Architecture wave that's sweeping the industry. Grady Booch et al. say your legacy appserver identifies you as from the last century... you need an Ontology-based application server. Joseki anyone? (Thanks Andrew Newman).

The perfect distributed application

The perfect distributed application: The client GETs an URL returning an enhanced RDF Form -- enhanced so that some RDQL accompanies each parameter. The RDQL describes how the client is to retrieve data from its own data model and populate the "form" parameters. The RDQL references RDF types and instances defined in the service's own ontology. Now a) we cannot require the client's data repository be physically stored as RDF, and b) even if it were, we cannot require that the client's ontology be the same as the one described in the service's ontology. So there is a model mapping problem ahead of us. But for the moment, presume we have solved that problem -- our client understands, somehow, the service's ontology, and can honor RDQL requests made using terms from that ontology.

So devise a client agent analogous to a web browser, and furnish it with with a reference to a callback interface it uses to satisfy RDQL requests. The callback interface is analogous to the human user, who, reading an ordinary HTML form page, knows how to populate the fields. This callback interface accepts RDQL queries, and honors them from your data repository. Voila! Instant, resilient distributed application engine. The service is free to change even the parameters it needs to satisfy any request. Say the airline reservation service evolves, and now, due to new TSA requirements, must have the passenger's Social Security Number to complete the reservation. No problem for our engine. The service simply adds a new element to the RDF Form, and supplies the RDQL to populate it. Our client agent passes the aditional RDQL to the callback, and obtains the SSN.

This system is a poor man's mobile agent. We're not going to use ObjectSpace Voyager, as cool as that is. Instead, our limited mobile agent performs local RDQL queries and sends messages back to the server. The enhanced RDF Form is the code for this agent. You download the form to your client agent and run it.

Now about that elephant in the room. How to map the service's ontology to our own? And don't we have to map everyservice's ontology to our own? Well, yes. Let's investigate how you'd attack that problem. (sound of dozens of shuffling feet leaving the room).

If your data model is an RDF store, the problem is mapping your ontology to the service's. The airline reservation service's ontology defines terms like Flight, Seat, City. Your own business's ontology has no concept of Flight or Seat. And it has an idea of City that's maybe a lot different from the airline's.

So part of creating this mapping is augmenting your ontology with terms required by the airline. Sure, you had city names in your model, but you didn't have airport codes like AUS for Austin. So wherever possible, you use constructs like owl:equivalentClass to map the airline's classes to existing classes in your ontology; elsewhere, add properties like airline:airportCode to your classes. You had to do all this work to invoke the service anyway, you know -- this is just a methodology for organizing it.

Now, if your data model is not stored as RDF, consider exposing an RDF interface to it. It would be straightforward to map SQL rows to RDF statements and column names to OWL predicates.

And in passing I'll mention that Service Data Objects are pretty cool. Programs operate on abstract object graphs, and you supply a Data Mediator Service that maps graphs to your own data model, be that in SQL, an XML repository, RDF repository, the file system, some remote EIS... wherever. If you've got a DMS, you've already mapped an arbitrary graph to your data model. So you can map an RDF graph to your data model. I'm not saying that work's done for you; just saying this problem is isomorphic to that one -- literally.

(I notice Patrick Logan remarking, "...if your restful interpreter and mine can understand some of the same state then they can cooperate. This is the idea behind these interpreters sharing partial ontologies or even being able to translate parts of one to another." Right on.)

Sunday, February 27, 2005

More on using RDF Forms to maintain application state in machine to machine services: If RDF forms contained RDQL to constrain parameters, then the client would know more completely how to fill in the "form". Say the form required submitting two flight numbers: departure and return. Both have rdf:type FlightNumber, but you better put the right one in the right field. A little RDQL, drawing on terms from the service's ontology, could constrain the destination city of the departure flight to be the same as the origination city of the return flight.

Saturday, February 26, 2005

When you sit down to write a description language for REST services (a IDL or WSDL for REST), you discover that doing so is unnecessary. "Hypermedia as the engine of application state" means that the service, not the client, constructs the URLs the client needs to invoke via GET, POST, etc. The heart of an IDL or WSDL is that they are instructions for clients invoking the service.

Example: a reservations system. In the RPC-style case, an IDL or WSDL might declare a method "NewReservation", and it would tell you some parameters: NewReservation (name, flight number, airline, date); you write client programs that collect that information from a user, and invoke the RPC.

An HTTP/HTML reservation system, however, constructs a form with input elements named "name", "flightnumber", "airline", and "date"; the user fills in the form and pressing the Submit buttons sends the information to the service. So the client program, a web browser, never knew the semantics of the service. The human operator did, of course: he read the descriptive text in the form and put his name in the proper box.

Now, that's fine for human operators. But what about machine to machine operations? In the RPC case, the client program is essentially the same as in the human case, only now the program collects the parameters from a database rather than from a live user. The client program, as before, is compiled against the IDL or WSDL. In the RPC case, we presume we can just invoke the service "out of the blue." We needn't have obtained any information from the service before hand; we just connect and invoke.

In a REST-style service, we want an analogy to the self describing hypermedia we have in the HTML scenario. First let's assume we'll use some form of XML as our hypermedia. It's easy to imagine a XML document. Maybe even more machine friendly would be an RDF document -- a bunch of RDF statements. Your client invokes GET on the well known URL of the service, and receives an RDF form. The RDF form describes the names of parameters and how to serialize them. So just as in the user driven HTML case, the client needs no foreknowledge of what the neccessary parameters do, or even what their names are. But we still need an automated "user" to fill out those parameters. Since the RDF form describes the parameters in RDF, your client can map the RDF types of those parameters to elements in its data model. Your client has to "understand" the service's ontology, sure. But that is a one-time mapping of ontology elements to, say, SQL queries.

Could you have done all of this using an RPC-style architecture? Maybe. You could have retrieved WSDL from some well known service. You would do that each time you want to invoke the service, to emulate the self-describing part. Then you could dynamically construct the RPC call -- the serialization bit wouldn't be hard -- if you also had a mapping of the RPC parameters to your data model. How would you do that? There would need to be some semantic description of the service parameters and you would need a mapping of that description to your own data model. Could you use some RDF to describe these semantics? Probably. But it's not a system designed from the ground up to be self describing.

Tuesday, February 01, 2005

A "key" difference between a primary key and an object identifier is that the primary key is part of the table row -- it's part of the information content of the thing itself. An object identifier is metainformation, information about the object.

A reference to information in the row moves around wherever you move the thing. A reference to an object identifier has to be updated when you move the thing.

A query "Find me the page containing terms 'Hughw' and 'blog'" searches the information content of the thing itself. ('Hughw' and 'blog' are not primary keys of course, just ordinary "column information"). A URI to this page is like an object reference. If I move the page to another server, I have to update all links to it. And so do you.

Which do you think is a better technique for persisting references?

Monday, January 31, 2005

Someday all links on the web will look like this link :

Hughw's Blog

The URL above is; i.e. it's the result of a query. Google returns 302 FOUND with the redirect to in the Location header.

What's the big deal? Well, the link above will never break until Google does. This feature is precisely the reason Codd avoided handling pointers in the relational model. You never obtain a reference to a row in the relational model (and Oracle REFs are decidedly not relational). Instead, you can only specify rows by requiring column values to be equal, or in some other relation.

Someday query URLs we use may be more structured. We'll pass RDF queries to Google.

To treat the web as an information retrieval mechanism, these query URLs will be the only sensible way to store references in web pages! Today, every site has broken links, even to pages under its own control.

Hmm, Google will crawl pages and discover links that call its own search function... weird... there is a recursive issue there, for them to solve :)

Blog Archive