/this/url/sucks/cgi-bin/showArticle2.php?
articleID=9487220&sessionID=394582849
&phaseOfMoon=…


Is there a reason I should care what scripting language your site is implemented in this week?

Is there a reason I should care what variable names your script uses this week?

Is there a reason I should care what directory you store your script in this week?

Is there a reason why I should see any implementation details at all, or be forced to try to cut and paste a 494-byte URL when I want to recommend your site to a friend?

And should it be harder to make a sensible URL than a ludicrous one?


These questions have been bugging me for a while now, and I’m certainly not the only one. The problem is that not only do many systems create ugly URLs by default — one of the most hideous examples I’ve seen recently was the otherwise interesting Exhibition Engine, which actually does generate 494-byte URLs — even a small reorg of your site will break user bookmarks unless you invest time in awkwardly-implemented URL-redirection libraries.

So I guess the real title of this rant is “Implementation Details Considered Harmful.” Now, how do we fix it?

People familiar with my other sites will understand why I’m interested in the Gallery project on SourceForge. I can’t really use it without separating the page layout from the code and ripping out the JavaScript and pop-ups, but it’s a cool package, and you can strip a lot of the implementation details out of the URLs. Well, if you’re running Apache 1.x with mod_rewrite, anyway.

What to do?

"The similarity to unix and other disk operating system filename conventions should be taken as purely coincidental, and should not be taken to indicate that URIs should be interpreted as file names." —W3C URI specs

For static web sites, the “URL as path” model made sense. You had to put your HTML files somewhere on the disk, after all, so a straight mapping from one to the other avoided all sorts of problems. It also created a few, like adopting the Unix ~user convention for personal areas on a server.

With all the scripting going on these days, though, it’s become a major design flaw, so deeply embedded that every web server has at least one workaround for it. I think Apache has at least a dozen, of which mod_rewrite is probably the most arcane and widely-used.

The fix? Stop looking at dynamic web sites as “scripts with arguments,” and change the server so they can be easily configured as “scripted name-spaces.” The Gallery folks are on the right track with their mod_rewrite hack: replace /gallery/view_album.php?set_albumName=fishtank&id=guppy4 with /gallery/fishtank/guppy4.

But don’t stop there. If someone links to that picture today, will their bookmark work next year? We have complete control over the /gallery/ name-space, so if we can’t find an exact match, we can turn it into a search. In fact, we can treat all requests as searches, creating galleries dynamically in response to user requests. I can type /gallery/guppy on your site, and you can type /gallery/models/blondes on mine. As long as there’s something on the site that matches, we’ll get what we were looking for, and saved bookmarks will work.

There’s CPU usage to deal with, and state to maintain, of course, but I’ve got a few ideas for those, as well. That’s another rant.

Update: Since I originally wrote this, I’ve migrated my online cookbook to work this way. Try it out, and pay attention to the URLs, especially when entered with errors.

Update: To no great surprise, someone else has proposed something similar in concept but considerably smaller in scope. He has, however, given it a catchy name, Slash Forward. I don’t actually like the name, even setting aside the fannish connotation of “slash”; for some reason it reminds me of the obscure and poorly-articulated artistic movement FoundView. No obvious relationship between the two, particularly in regards to how they’re explained, but I get cranky sometimes.