Fontographer 5.0 is out. I knew they’d done a cleanup release after acquiring the old code, but I hadn’t expected the FontLab folks to do major new development on it.
In this well-linked news, a team of researchers has reported success at curing erectile dysfunction with shockwaves. When describing how much force is being applied to the penis, they chose a very revealing comparison:
"These are very, very low energy shock waves," Vardi said. Each shockwave applied roughly 100 bar of pressure — some 20 times the air pressure in a bottle of champagne, but less than the pressure exerted by a woman in stiletto heels who weighs 132 lbs. (60 kg).
Apparently medical research is now being performed in full dominatrix gear. Who knew?
Emacs 23 natively uses Unicode. This means I can run it in a Terminal window, like God intended, and still have full Japanese support. Previous versions did funky Shift-JIS conversions that made its behavior… “eccentric” on a Mac, especially with cut-and-paste.
Now all I have to do is strip out all of the cruft from the elisp directory, and I’ll have the perfect text editor. Actually, it’ll be easier to delete everything and just add back the non-cruft as needed. There’s not much that I don’t consider cruft, so it will be pretty darn small.
[side note: a release comment says something to the effect that the internal encoding is a superset of Unicode with four times the space, which would make it a 34-bit system. WTF? Update: ah, I see; UTF-32 has a lot of empty space, with only a bit over 20 bits allocated in the Unicode standard. UTF-8 was also designed with considerable headroom, which is no surprise, given that it was invented during dinner by Ken Thompson.]
Here’s your definitive manual’s complete comparison of Perforce to Mercurial:
Perforce has a centralised client/server architecture, with no client-side caching of any data. Unlike modern revision control tools, Perforce requires that a user run a command to inform the server about every file they intend to edit.
The performance of Perforce is quite good for small teams, but it falls off rapidly as the number of users grows beyond a few dozen. Modestly large Perforce installations require the deployment of proxies to cope with the load their users generate.
In order, I say, “bullshit”, “feature”, “buy a server, dude”, and “you’re doing it wrong”.
In fairness, the author admits up front that his comments about other tools are based only on his personal experience and biases, and the inline comments for this section point out its flaws. Still, it’s clear that his personal experience with Perforce was… limited. Also, he’s either not aware of the features it has that Mercurial lacks, or simply discounts them as “not relevant to the way Our Kind Of People work”.
I’m not criticizing the tool itself, mind you; I’ve tried out several distributed SCMs in the past few years, and Mercurial seems to be fast, stable, easily extensible, and well-supported. I’m switching several of my Japanese projects to it from Bazaar, and it cleanly imported them. It handles Unicode file names and large files a lot better, which were causing me grief in the other tool.
There are things I can’t do in Mercurial that I do in Perforce, though, and some of them will likely never be possible, given the design of the tool. [Update: for-instance deleted; it appears that if you always use the -q option to hg status, you avoid walking the file system, and you can set it as a default option on a per-repository basis. If the rest of the commands play nice, that will work. The real value of explicit checkouts, even in that example, is the information-sharing, something that devs often value less than Operations does.]
Just for amusement…

Vlc hits version 1.0. Now they can start working on the user interface!
[Update: if you save your work in the Finereader-specific format, then changes you make after that point will automatically be saved when you exit the application; this is not clear from the documentation, and while it’s usually what you want, it may lead to unpleasant surprises if you decide to abandon changes made during that session.]
After several days with the free demo, in which I tested it with sources of varying quality and tinkered with the options, I bought a license for FineReader Pro 9.0 (at the competitive upgrade price for anyone who owns any OCR product). I then spent a merry evening working through an album where the liner notes were printed at various angles on a colored, patterned background. Comments follow:
I’ve gotten pretty good at transcribing Japanese stories and articles, using my DS Lite and Kanji sonomama to figure out unfamiliar kanji words, but it’s still a slow, error-prone process that can send me on half-hour detours to figure out a name or obsolete character. So, after googling around for a while, I downloaded the free 15-day demo of FineReader Pro and took it for a spin. Sadly, this is Windows software, so I had to run it in a VMware session; the only product that claims to have a kanji-capable Mac product has terrible reviews and shows no sign of recent updates.
First test: I picked up a book (Nishimura’s murder mystery collection Ame no naka ni shinu), scanned a two-page spread at 600-dpi grayscale, and imported it into FineReader. I had to shut off the auto-analysis features, turn on page-splitting, and tell it the text was Japanese. It then correctly located the two vertically-written pages and the horizontally-written header, deskewed the columns (neither page was straight), recognized the text, and exported to Word. Then I found the option to have it mark suspect characters in the output, and exported to Word again. :-)
Results? Out of 901 total characters, there were 10 errors: 6 cases of っ as つ, one あ as ぁ, one 「 as ー, one 呟 as 眩, and one 駆 recognized as 蚯. There were also two extra “.” inserted due to marks on the page, and a few places where text was randomly interpreted as boldface. Both of the actual kanji errors were flagged as suspect, so they were easy to find, and the small-tsu error is so common that you might as well check all large-tsu in the text (in this case, the correct count should have been 28 っ and 4 つ). It also managed to locate and correctly recognize 3 of the 9 instances of furigana in the scan, ignoring the others.
I’d already typed in that particular section, so I diffed mine against theirs until I had found every error. In addition to FineReader’s ten errors, I found two of mine, where I’d accepted the wrong kanji conversion for words. They were valid kanji for those words, but not the correct ones, and multiple proofreadings hadn’t caught them.
The second test was a PDF containing scanned pages from another book, whose title might be loosely translated as “My Youth with Ultraman”, by the actress who played the female team member in the original series. I’d started with 600-dpi scans, carefully tweaked the contrast until they printed cleanly, then used Mac OS X Preview to convert them to a PDF. It apparently downsampled them to something like 243 dpi, but FineReader was still able to successfully recognize the text, with similar accuracy. Once again, the most common error was small-tsu, the kanji errors were flagged as suspect, and the others were easy to find.
For amusement, I tried Adobe Acrobat Pro 9.1’s language-aware OCR on the same PDF. It claimed success and looked good on-screen, but every attempt to export the results produced complete garbage.
Both tests were nearly best-case scenarios, with clean scans, simple layouts, and modern fonts at a reasonable size. I intend to throw some more difficult material at it before the trial expires, but I’m pretty impressed. Overall, the accuracy was 98.9%, but when you exclude the small-tsu error, it rises to 99.6%, and approaches 99.9% when you just count actual kanji errors.
List price is $400, but there’s a competitive upgrade available for customers with a valid license for any OCR software for $180. Since basically every scanner sold comes with low-quality OCR software, there’s no reason for most people to spend the extra $220. They use an activation scheme to prevent multiple installs, but it works flawlessly in a VMware session, so even if I didn’t own a Mac, that’s how I’d install it.
[updates after the jump]