Hugh Winkler holding forth on computing and the Web

Showing posts with label web. Show all posts
Showing posts with label web. Show all posts

Thursday, May 29, 2008

Self-appointed Guardians of Truth

SiteTruth has given my company's web site a big red do not enter sign.

Even though we don't sell products electronically, they wish we had a certificate. And we've never put our office address on that site -- an oversight when we moved, not a scam. So they give us a big red "Do Not Enter" sign, indicating our site is dangerous to enter. From their "about" page:

Every on-line commerce web site must display the name and address of the business behind the site. That's the law in much of the developed world. SiteTruth tries to identify that business, then find information about it. That check is used to influence search rankings. That's SiteTruth. (emphasis mine)


We're not an "on-line commerce web site", but their system can't detect that, so they're selling technology that will lower our search rankings?

We'll fix our site to please them, of course; why not? But their technology doesn't seem to increase the safety of the web. And is likely to piss off other legit site owners, some of whom may even feel litigious. Could you blame them?

Monday, March 03, 2008

Rule of Least Power: Bah!

The Rule of Least Power, a W3C TAG finding, posits: "Powerful languages inhibit information reuse." They're observing that it's easy to scrape documents written declaratively using HTML. The problem with using more powerful languages like Javascript, they say, is that "you typically cannot determine what a program in a Turing-complete language will do without actually running it."

So? As long as the output is a DOM, just run the program and inspect the DOM.

You already have to use a good HTML parser, right? Now, just run all the script elements on the page too -- obviously, in a restricted environment.

I'm sure Google and friends must do this. They're not going to leave valuable information on the table.

Friday, November 16, 2007

What is wrong with this picture?

Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
content-disposition: attachment; filename=Hill%20Country%20water%20issues[1].ppt
Content-Type: application/unknown

IIS servers don't come provisioned with the .ppt extension mapped to application/vnd.ms-powerpoint? Microsoft server? Microsoft application?

What hope can there be for authoritative metadata?

My Firefox browser figures it out just fine, and launches Open Office. Presumably FF went through all this first, just to determine it to be application/octet-stream; then ran a further sniffer to identify it as a Power Point.

Tuesday, October 02, 2007

That killer web platform

Here we go again with this specious argument that the Web isn't rich enough. Joel thinks there's a new killer platform out there to be invented, that will seize control of web applications as Windows seized the desktop.

Ain't gonna happen. Or, if you prefer: Already happened.

Right there in his own essay is the reason.


And that’s exactly where we are with Ajax development today. Sure, yeah, the usability is much better than the first generation DOS apps, because we’ve learned some things since then. But Ajax apps can be inconsistent, and have a lot of trouble working together — you can’t really cut and paste objects from one Ajax app to another, for example, so I’m not sure how you get a picture from Gmail to Flickr. Come on guys, Cut and Paste was invented 25 years ago.


See, Ajax gives you the capability to turn a perfectly good hypertext application into a miserable facsimile of a 1980's PC. And you're not going to fix Ajax by adding a bunch of new APIs. Applications need more constraints, not fewer.

Think how absurd it is that you can't copy a picture from GMail to Flickr. The tools are right there, but the application designers do not leverage them. a) Right click photo in GMail. b) "Copy link location". c) Paste hyperlink into Flickr. d) Flickr either downloads photo from GMail or references it. No new APIs needed -- it's all just hyperlinks.

It's great, and necessary, to extend HTML with rich widgets. We'll never capture them all, declaratively, in a common HTML. I am, even as we speak, constructing a Flash widget. But the web is the platform. Any time I push information deep into my widget -- text that could be searchable, graphics that could be linkable -- and hide it from the web, I've failed to leverage the platform.

Friday, May 04, 2007

RIA Not Advancing the Ball

Rich widgets obscure the semantics of hypertext. Only the code behind the form knows what the widget really does. Contrast to HTML 5, and in particular Web Forms 2. These guys are extending HTML to capture what we really do on the web. As a consequence, client programs can (or, have a chance to) understand the meanings of hypertext documents from the web.

Example: you want to write a script to automate some remote bookmark service, as part of your mashup. But this service, unlike del.icio.us, has no documented "API". So you have to download its form, complete it programmatically, and POST an entity.

Case 1: The form uses Plain Old HTML. You're golden. All the semantics are right there for you to parse, or read. You identify the name of the text box where you stick the URL, and the name of the text box where you add a description. You compose the URL encoded form data, and POST it to the action URI.

Case 2: The "form" uses Javascript to modify the DOM on the fly: the onload() method adds text boxes, and a submit button, to an empty DOM. In fact, it might not even use the submit button as a form element; when you press the button, its onclick() might send a custom XMLHttpRequest back to the server. Your code will never automate this interaction.

Case 3: The "form" uses XAML + Silverlight plugin. An exacerbated case of (2).

In contrast, Web Forms 2 attempts to capture the semantics of what we do with forms. Because browsers will understand more of the semantics of the form, we can do declaratively what we now have to do in Javascript. For example, lots of HTML forms now have to use script to add a row of controls to a form on the fly ("Click here to add another bookmark"). WF2 captures that as repeating control groups, and the browser can handle it.

(Then again, I am a documented forms nut.)

Mike Dierken justly analogizes: "RIA is to user interfaces as RPC is to messaging interfaces". And notice, it's Rich Internet, not Rich Web Applications. These technologies pay lip service to the web, but they're not advancing the ball toward building more and better links throughout the web information space.

P.S. Wonderful rant by Mark Pilgrim!

Update: fixed a link.

Thursday, May 03, 2007

RIA -- Fill 'er Up!

I'm having an ongoing email exchange with my friend Peter. He's convinced MS and Adobe herald a new age of Rich Internet Applications. He pointed me to this guy who's backed up a tanker to the Kool-Aid trough.

Sure, MS and Adobe have to sell something as the Next Thing -- what else have they got? But we've had RIA ever since Java 1.1 applets. We have Flash. We have <embed> and <object>. Do you really think what's been holding RIA back is the technology?

Users have voted with their mice, and they've voted for the web experience -- exploring the web information space using hyperlinks -- as far more important than whizzy UI. Ask eBay. Ask MySpace.

Flash, applets, Silverlight, Javascript -- the more you use them, the suckier your web apps are at exploring the web information space. I don't think it has to be this way, but it takes a design discipline few seem to have. These programming models are from the 80s. They have web APIs, but they're not web oriented. Programs end up as little desktop applications, not web apps. I don't see Silverlight changing that. It is good to have super expressive widgets -- hear hear. But if you're not pushing a bunch of hypertext down to my browser, you're not helping me explore the space.

Tuesday, April 03, 2007

The cure is worse than the disease

This paper from Fortify makes the case that sending sensitive information using JSON exposes it to cross-site maliciousness. GMail sent your contact list down as JSON and evaled it. Turns out, any old site could do the same: just put a <script> tag referencing that contact list, and install some interceptor code that overloads setting e.g. the "email" property on any object: That enables the malicious code to see the values in the JSON.

Here are a couple of their proposed measures:

1. "Add the session cookie to the request as a parameter." Knee-slapper, that. See, the exploit only works because vulnerable sites put your identity into the cookie, and use a single URL for all users to download the object; the server uses the cookie to send you your personalized contact list. So the attacker just has to hardcode <script src="http://yoursite.com/contact-list">. The paper proposes uniquifying the URL. Here's an idea: design your app so that each user's info is at a unique URL in the first place!

2. Send all legitimate requests for JSON data using HTTP POST! That way you know any GET requests are malicious ones from <script> tags. They do concede that "The use of GET for better performance is encouraged by Web application experts from Sun and elsewhere". There's no use for this measure if you use unique URLS, of course.

So yeah, this is a serious problem, but not for apps using best web architecture practices. Millions of web developers read papers like that and then crap all over the web.

Monday, February 05, 2007

Web 0.9

Sure took a long time to pay the car note this evening. I wish the developers had read Mark's caching tutorial (or rather that their J2EE framework developers had). Below are headers representative of about a hundred .gif, .css, and .js resources used on the page:


GET /navigation/images/global/company.gif HTTP/1.1
Host: www.financecompany.example.com

...

HTTP/1.x 200 OK
Server: IBM_HTTP_Server/2
Last-Modified: Tue, 20 Sep 2005 18:24:36 GMT
Etag: "31e467-450-2c11c100"
So far, so good

Accept-Ranges: bytes
Content-Length: 1104
Content-Type: image/gif
Expires: Mon, 05 Feb 2007 07:31:41 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Mon, 05 Feb 2007 07:31:41 GMT
Huh? It's a GIF that hasn't changed in a year.

Connection: keep-alive