Thursday, May 18 2006

PDF::API2, Preview.app, kanji fonts, and me

I’d love to know why this PDF file displays its text correctly in Acrobat Reader, but not in Preview.app (compare to this one, which does). Admittedly, the application generating it is including the entire font, not just the subset containing the characters used (which is why it’s so bloody huge), but it’s a perfectly reasonable thing to do in PDF. A bit rude to the bandwidth-impaired, perhaps, but nothing more.

While I’m on the subject of flaws in Preview.app, let me point out two more. One that first shipped with Tiger is the insistence on displaying and printing Aqua data-entry fields in PDF files containing Acrobat forms, even when no data has been entered. Compare and contrast with Acrobat, which only displays the field boundaries while that field has focus. Result? Any page element that overlaps a data-entry field is obscured, making it impossible to view or print the blank form. How bad could it be? This bad (I’ll have to make a screenshot for the non-Preview.app users…).

The other problem is something I didn’t know about until yesterday (warning: long digression ahead). I’ve known for some time that only certain kanji fonts will appear in Preview.app when I generate PDFs with PDF::API2 (specifically, Kozuka Mincho Pro and Ricoh Seikaisho), but for a while I was willing to work with that limitation. Yesterday, however, I broke down and bought a copy of the OpenType version of DynaFont’s Kyokasho, specifically to use it in my kanji writing practice. As I sort-of expected, it didn’t work.

[Why buy this font, which wasn’t cheap? Mincho is a Chinese style used in books, magazines, etc; it doesn’t show strokes the way you’d write them by hand. Kaisho is a woodblock style that shows strokes clearly, but they’re not always the same strokes. Kyoukasho is the official style used to teach kanji writing in primary-school textbooks in Japan. (I’d link to the nice page at sci.lang.japan FAQ that shows all of them at once, but it’s not there any more, and several of the new pages are just editing stubs; I’ll have to make a sample later)]

Anyway, what I discovered was that if you open the un-Preview-able PDF in the full version of Adobe Acrobat, save it as PostScript, and then let Preview.app convert it back to PDF, not only does it work (see?), the file size has gone from 4.2 megabytes to 25 kilobytes. And it only takes a few seconds to perform this pair of conversions.

Wouldn’t it be great to automate this task using something like AppleScript? Yes, it would. Unfortunately, Preview.app is not scriptable. Thanks, guys. Fortunately, Acrobat Distiller is scriptable and just as fast.

On the subject of “why I’m doing this in the first place,” I’ve decided that the only useful order to learn new kanji in is the order they’re used in the textbooks I’m stuck with for the next four quarters. The authors don’t seem to have any sensible reasons for the order they’ve chosen, but they average 37 new kanji per lesson, so at least they’re keeping track. Since no one else uses the same order, and the textbooks provide no support for actually learning kanji, I have to roll my own.

There are three Perl scripts involved, which I’ll clean up and post eventually: the first reads a bunch of vocabulary lists and figures out which kanji are new to each lesson, sorted by stroke count and dictionary order; the second prints out the practice PDF files; the third is for vocabulary flashcards, which I posted a while back. I’ve already gone through the first two lessons with the Kaisho font, but I’m switching to the Kyoukasho now that I’ve got it working.

Putting it all together, my study sessions look like this. For each new kanji, look it up in The Kanji Learner’s Dictionary to get the stroke order, readings, and meaning; trace the Kyoukasho sample several times while mumbling the readings; write it out 15 more times on standard grid paper; write out all the readings on the same grid paper, with on-yomi in katakana and kun-yomi in hiragana, so that I practice both. When I finish all the kanji in a lesson, I write out all of the vocabulary words as well as the lesson’s sample conversation. Lather, rinse, repeat.

My minimum goal is to catch up on everything we used in the previous two quarters (~300 kanji), and then keep up with each lesson as I go through them in class. My stretch goal is to get through all of the kanji in the textbooks by the end of the summer (~1000), giving me an irregular but reasonably large working set, and probably the clearest handwriting I’ve ever had. :-)