Hugh Winkler holding forth on computing and the Web

Thursday, September 29, 2005

Free vs Safe in semweb

Ian Davis is having an RDF breakdown. Seems Dublin Core can't seem to get dc:creator quite right:

That's when my crisis struck. I was sitting at the world's foremost metadata conference in a room full of people who cared deeply about the quality of metadata and we were discussing scraping data from descriptions! Scraping metadata from Dublin Core! I had to go check the dictionary entry for oxymoron just in case that sentence was there! If professional cataloguers are having these kinds of problems with RDF then we are fucked....A simpler RDF could take a lot of this pain away and hit a sweet spot of simplicity versus expressivity

In the free vs safe debate, looks like he's making a run at freedom.

Monday, September 26, 2005

Technorati. Sigh.

If I do a Technorati search for some URL (say, this blog), I get the Technorati search page. In the upper right I notice an image "Add to watch list". That's right, if I click the link, the browser will do an HTTP GET on URL, and change my watch list. I guess it must be a good thing that the above URL modifies your watchlist, not mine, if you click it. Although a lot of people would call it a bad thing for your identity cookie to make an URL identify one resource for me, another resource for you. I guess if you're going to introduce side effects for GET, you might as well fix it by making the URI identify multiple resources.

Update: I could be off base with the side effects argument. The side effect of clicking the link N times really is the same as clicking it once, which is all RFC 2616 asks. Something in me reacted to having clicking a link change some state, but I guess I shouldn't get my panties in a bunch about it. Still, the URL identifies your watch list for you, and my watch list for me. That's wrong, isn't it?

Saturday, September 24, 2005

Search is the new Forms

I'm late to the Atom web services party. Why didn't someone tell me the Atom Publishing Protocol covers all of the territory I've been discussing for RESTful WITSML web services?

Now OpenSearch promises that I can expose a search URL over my Atom service that can be used by search aggregators to do sort of intelligent searching.

If anybody can search my web service using a standard protocol, to discover URLs of resources they are interested in, it's the poor man's equivalent of having a forms language. Aren't 90% of the forms you fill out on the web some form of search?

I can force a lot of the semantics of my web service into search. What can't you model as a search? You can model UDDI as search. You can model any web catalog as a search. Heck, you can model solving a differential equation as a search.

Search may substitute for a really articulate, unconstrained forms language.

Without a forms language, REST web services are little more useful than RPC style web services. That's because the guy programming the service client has to understand, at design time, the semantics of each URL. Example: You learn the algorithm for constructing URLs and write your program to build them given some parameters you collect from a user. It's the same idea as calling a remote procedure. Other "REST" services might just supply you with a menu of URLs, whether they honor GET or POST, and the media types you can send or receive. Again, you're doing it at design time.

RPC services force clients to understand them at design time. You have to read some documentation and construct your program so that it calls functions in some order that makes sense to that service.

REST services use "hypermedia as the engine of application state." One realization of that idiom is HTML forms. Forms are how the service bypasses the browser. The guy who wrote the browser does not understand what is in the form. But he knows it is an HTML form and he has the browser render it for you to complete. The form tells the browser how to serialize the fields you complete and POST them to the service. It is HTML forms that enable you to order a plane ticket, or a book, using the same piece of compiled software: the browser. The form is a little program the browser downloads and executes at run time. The result of executing the program is a string, or a multipart message, the browser can submit to the service to obtain some other resource representation -- which, like all the other HTML it traffics in, the browser does not "understand".

So how can a web service enable the same dynamic capability for machine, as opposed to human, agents? Here's the use case: You're dropping your own service into a brew of services running in some environment. And your service needs the results of other services as input to its own. It needs to locate those services in the brew. And it needs to invoke those services correctly and interpret the results. At design time, you don't understand how any of the other services work, or which services will be available; but you do understand the documents they traffic.

If you had a really intelligent automaton on the client side, it could retrieve a form document from any service telling it what parameters to retrieve and how to serialize them. But I'm pretty sure we're not going to have the intelligent automatons I outlined in a previous blue sky piece.

Instead, you have the capability to search. It's a lot like completing a form. It's more constrained than that, though. It's the kind of form that can only do one thing, for all applications.

So you're programming the travel reservations application. Your app can search a directory for the airline, auto rental, and hotel reservation services. It searches the airline service for flights from Austin to Atlanta leaving Monday, returning Wednesday. It searches for mid-size rental cars available in Atlanta. It searches for hotels in downtown Atlanta in a certain price range. Because we've standardized search, you program each of these interactions using the same model.

To complete the transaction, you would use the APP to create a purchase order document with the service.

It would be cool if services could annotate the search terms with RDF properties. OpenSearch doesn't try to get that sophisticated, and good for it. But to complete an airline reservation you're going to need to know how to search for "flight", and not have the search return some other object. You could draw the search terms from an airline ontology.

But if I suggest SPARQL as an alternative to OpenSearch, Bosworth and the free vs safe libertines will jump my shit. Maybe rightly so. I'm still re-educating.

Friday, September 23, 2005

Are data models passé?

In the free vs safe debate, free is winning. That's a debate not limited to programming languages. You see the same meme in the web services debates. Google does "free" for data. Adam Bosworth's pitching open, "dumb" search standards; so is Joe. Do we need formal logical data models?

We won't be able to impose them. No data architect will design a master schema or ontology over domains like, say, process control, or auctions. Instead, mediators like Google will infer models from content. Or each of us will contribute our bit to the global model by social bookmarking and tagging. The excitement in searches will be in augmenting the raw search results served up by the dumb search protocols, with the value a mediator like Google or adds in imposing their inferred models, making the searches faster or more accurate.