Thursday, June 1 2006


I want a better text editor. What I really, really want, I think, is Gnu-Emacs circa 1990, with Unicode support and a fairly basic Cocoa UI. What I’ve got now is the heavily-crufted modern Gnu-Emacs supplied with Mac OS X, running in, and when I need to type kanji into a plain-text file.

So I’ve been trying out TextWrangler recently, whose virtues include being free and supporting a reasonable subset of Emacs key-bindings. Unfortunately, the default configuration is J-hostile, and a number of settings can’t be changed for the current document, only for future opens, and its many configuration options are “less than logically sorted”.

What don’t I like?

First, the “Documents Drawer” is a really stupid idea, and turning it off involves several checkboxes in different places. What’s it like? Tabbed browsing with invisible tabs; it’s possible to have half a dozen documents open in the same window, with no visual indication that closing that window will close them all, and the default “close” command does in fact close the window rather than a single document within it.

Next, I find the concept of a text editor that needs a “show invisibles” option nearly as repulsive as a “show invisibles” option that doesn’t actually show all of the invisible characters. Specifically, if you select the default Unicode encoding, a BOM character is silently inserted at the beginning of your file. “Show invisibles” won’t tell you; I had to use /usr/bin/od to figure out why my furiganizer was suddenly off by one character.

Configuring it to use the same flavor of Unicode as TextEdit and other standard Mac apps is easy once you find it in the preferences, but fixing damaged text files is a bit more work. TextWrangler won’t show you this invisible BOM character, and /usr/bin/file doesn’t differentiate between Unicode flavors. I’m glad I caught it early, before I had dozens of allegedly-text files with embedded 文字化け. The fix is to do a “save as…”, click the Options button in the dialog box, and select the correct encoding.

Basically, over the course of several days, I discovered that a substantial percentage of the default configuration settings either violated the principle of least surprise or just annoyed the living fuck out of me. I think I’ve got it into a “mostly harmless” state now, but the price was my goodwill; where I used to be lukewarm about the possibility of buying their higher-end editor, BBEdit, now I’m quite cool: what other unpleasant surprises have they got up their sleeves?

By contrast, I’m quite fond of their newest product, Yojimbo, a mostly-free-form information-hoarding utility. It was well worth the price, even with its current quirks and limitations.

Speaking of quirks, my TextWrangler explorations yielded a fun one. One of its many features, shared with BBEdit, is a flexible syntax-coloring scheme for programming languages. Many languages are supported by external modules, but Perl is built in, and their support for it is quite mature.

Unfortunately for anyone writing an external parser, Perl’s syntax evolved over time, and was subjected to some peculiar influences. I admit to doing my part in this, as one of the first people to realize that the arguments to the grep() function were passed by reference, and that this was really cool and deserved to be blessed. I think I was also the first to try modifying $a and $b in a sort function, which was stupid, but made sense at the time. By far the worst, however, from the point of view of clarity, was Perl poetry. All those pesky quotes around string literals were distracting, you see, so they were made optional.

This is still the case, and while religious use of use strict; will protect you from most of them, there are places where unquoted string literals are completely unambiguous, and darn convenient as well. Specifically, when an unquoted string literal appears in list context followed by the syntactic sugar “=>” [ex: (foo => “bar”)], and when it appears in scalar context surrounded by braces [ex: $x{foo}].

TextWrangler and BBEdit are blissfully unaware of these “bareword” string literals, and make no attempt to syntax-color them. I think that’s a reasonable behavior, whether deliberate or accidental, but it has one unpleasant side-effect: interpreting barewords as operators.

Here’s the stripped-down example I sent them, hand-colored to match TextWrangler’s incorrect parsing:


use strict;

my %foo;
$foo{a} = 1;
$foo{x} = 0;

my %bar = (y=>1,z=>1,x=>1);

$foo{y} = f1() + f2() + f3();

sub f1 {return 0}
sub f2 {return 1}

sub f3 {return 2}