“Holy shit looks like that son of a bitch @elonmusk really saved democracy as a casual side quest between playing Diablo 4 and sending rockets to Mars.”

— wassielawyer

The least-essential kanji


JMdict combines the results of several research projects to determine whether or not a word should be counted as “common”. There are currently 15,572 words on this list that include at least one kanji character. Out of the 1,945 standard Jouyou kanji, 1,894 show up. The 51 rare ones are listed below:

舟 悦 抄 附 殉 禍 硝 酌 鋳 錬 剛 頒 芳 薫 尼 庸 廉 逝 遵 某 貞 滋 墾 蚕 侯 仙 孔 吏 嗣 謁 宵 丙 壱 弐 升 匁 勺 隻 脹 坑 痘 曹 恭 詔 朕 錘 銑 塑 虞 繭 璽

Of course, if you’ve mastered the other 1,894, it seems pretty silly to ignore these, especially when there are 576 non-Jouyou kanji on the list:

其 此 頃 惚 呆 呂 剥 袖 脇 拭 兎 儲 伊 鹿 馴 霞 雀 闇 蜜 蔑 苛 苑 胡 纏 繋 籤 瓜 爪 濡 潰 溜 洒 旦 掻 捻 戴 憧 宛 塵 吃 叩 凄 儘 俄 鱈 鬱 骸 馳 餅 頬 須 韓 雛 隙 阪 鍋 錆 迄 辻 輿 跨 賭 賑 貪 謎 誰 詫 蝶 蜂 藤 薩 薔 薇 蕎 葵 葛 萩 萎 艶 臆 股 耶 綻 綴 籠 箸 稽 禿 碌 眩 眉 痺 痕 痒 瓦 狙 犇 煽 烏 汲 汎 槌 椅 梯 柏 暢 暈 斐 擽 揉 揃 捲 挫 拳 拉 或 徨 彷 巾 屑 屏 尻 尤 妬 妖 塞 噛 噌 嘘 喧 喋 呪 吊 叶 只 匂 勿 剃 僻 俯 仄 亀 乍 乃 丼 齎 鼾 鼠 黴 黍 麺 麹 麟 麓 麒 鹸 鷹 鷲 鶴 鴨 鳩 鰻 鰯 鯵 鯛 鯉 鮪 鬘 髭 騙 駒 饉 饂 餡 餌 餃 飴 飩 颯 顎 頷 頓 頁 韆 鞦 鞄 靨 靡 雁 隼 隈 阿 阜 閾 閏 閃 鐵 鎌 鍵 錫 錨 錦 鋸 鋏 銚 釜 釘 醤 醒 酎 那 遽 遥 遡 遜 逞 這 迦 迚 辿 辰 轢 軈 躾 躓 躊 躇 蹴 蹲 蹌 踵 踉 跪 贔 贅 賂 貶 貰 貌 讐 謳 謂 諺 諦 諄 詣 訣 訛 覗 襖 襁 褪 褓 褄 裾 袴 蟻 蟹 螺 蝿 蝕 蝋 蝉 蜻 蜘 蜀 蛸 蛛 蛙 蛋 蛉 虹 虎 藍 藁 薐 蕾 蔭 蔓 蓮 蓙 蓋 蒙 蒔 葱 葡 萌 萄 菩 菠 莫 荻 茹 茸 茣 茄 苺 苔 芯 舵 舐 舅 臥 臍 膳 膝 腺 腫 腎 脛 脆 肘 聳 聊 耽 翔 羨 罹 罵 罠 罅 縺 縞 縋 縊 緞 絨 絆 紐 糺 糊 粥 簾 箪 箒 箋 筐 筈 笹 笥 笠 竿 窶 窄 稍 稀 秤 禄 祇 磯 瞼 瞳 瞭 瞑 睨 睦 皺 皰 癌 瘤 痩 痣 疹 畿 畏 甦 甥 璧 瑞 瑚 琶 琵 琲 琢 珊 珈 玩 獅 猪 狼 狡 狐 牙 牌 爽 爺 燵 燭 燕 熊 煎 煌 煉 焜 焚 焉 炬 炒 灘 灌 澱 漕 漑 滲 溺 溢 湧 渾 渚 淵 淀 涎 浚 沙 汰 氾 殆 歪 櫛 檻 檎 橙 橘 樽 樺 槍 楷 楯 楠 楕 楊 椿 椒 椀 棲 棘 梨 梁 桶 桐 桂 桁 栗 柿 柴 柑 枕 杜 杖 杏 李 曙 曖 晦 晒 昧 昏 旁 斡 斑 敲 撰 撫 撥 撒 摸 摯 揶 揄 掴 掬 掟 掏 捩 捧 捗 挽 挨 拶 拵 戚 愈 惧 惣 悉 恰 怯 忽 彦 彙 弥 弛 弄 庵 庇 幡 巽 巴 巳 嵐 嵌 崖 峙 屹 屡 屓 屁 尖 寅 宥 孰 孕 嬉 嫋 嫉 婉 娼 姪 姑 奢 夥 壕 堰 堆 埃 垢 囁 囀 嚏 噂 嘸 嘴 嘲 嘩 嘗 嗽 嗟 嗅 喩 喉 唾 唸 唄 哨 咳 咎 咄 呟 吠 叱 卿 匙 勾 剪 凭 凧 几 凌 凋 冴 兜 儚 僑 僅 偖 倶 俺 俣 俎 侶 佇 佃 伎 些 也 乞

What this little bit of late-night script-hacking really demonstrates (if anything) is that the Jouyou kanji selection is actually pretty good, with only a few “useless” characters, only one of which is taught as part of the 1,006 grade-school set (蚕 = silkworm). The Kanji In Context books also fare pretty well in this test; only one “useless” character is among the first 1,000 taught (舟, a variant of 船 = ship), and more than half are among the last 100.

Dear Hello!Project Costume Designers,


A while back, I was watching an H!P concert video at Scott’s house, and I said, “ah, it’s time for the trashy outfits”. Scott eagerly came over to take a look, and then realized that he had misinterpreted what I’d meant by trashy. From a certain point of view, I suppose you deserve some praise for always seeking new innovations in costume design, but seriously, I’ve got to ask.

more...

Why is it suddenly too late for CADS?


For some time now, I’ve been writing Perl scripts to work with kanji. While Perl supports transparent conversion from pretty much any character encoding, internally it’s all Unicode, and since that’s also the native encoding for a Mac, all is well.

Except that for backwards compatibility, Perl doesn’t default to interpreting all input and output as Unicode. There are still lots of scripts out there that assume all operations work in terms of 8-bit characters, and you don’t want to silently corrupt their results.

The solution created in 5.8 was the -C option, which has half a dozen or so options for deciding exactly what should be treated as Unicode. I use -CADS, which Uni-fies @ARGV, all open calls for input/output, and STDIN, STDOUT, and STDERR.

Until today. My EEE is running Fedora 9, and when I copied over my recently-rewritten dictionary tools, they refused to run, insisting that it’s ‘Too late for “-CADS” option at lookup line 1’. In Perl 5.10, you can no longer specify Unicode compatibility level on a script-by-script basis. It’s all or nothing, using the PERL_UNICODE environment variable.

That, as they say, sucks.

The claim in the release notes is:

The -C option can no longer be used on the #! line. It wasn't working there anyway, since the standard streams are already set up at this point in the execution of the perl interpreter. You can use binmode() instead to get the desired behaviour.

But this isn’t true, since I’ve been running gigs of kanji in and out of Perl for years this way, and they didn’t work without either putting -CADS into the #! line or crufting up the scripts with explicitly specified encodings. Obviously, I preferred the first solution.

In fact, just yesterday I knocked together a quick script on my Mac to locate some non-kanji characters that had crept into someone’s kanji vocabulary list, using \p{InCJKUnifiedIdeographs}, \p{InHiragana}, and \p{InKatakana}. I couldn’t figure out why it wasn’t working correctly until I looked at the #! line and realized I’d forgotten the -CADS.

What I think the developer means in the release notes is that it didn’t work on systems with a non-Unicode default locale. So they broke it everywhere.

Don't try this at home, kids


Leave it to trained professionals…

…or, in this case, the idol group Idoling!!!, manufactured by Fuji TV for a weekly variety show. I think the wrestling matches were a fan request…

Warning: they do not lip-sync their songs on the show (live vs studio). You may have already figured this out if you watched the embedded clip…

A club that will have me as a member...


I would guess that somewhere in the neighborhood of 100,000 people have been qualified to wear this t-shirt. Approximately 99,950 of them paid for the privilege of membership, and the other 50 would have gotten the joke. Of those, perhaps a dozen will still remember it.

Bryant7

[and no, it’s not actually funny unless you’re one of those dozen, so I’m not going to explain]

[Update: Rory suggested a slight modification to improve the reference, and further decrease the number of people who’ll get it]

Oh, yeah, I bought it.


I’m not a big fan of celebrity gossip rags, even though I’ve been published in one. Between not really caring who’s been caught with who and hating the whole paparazzi sleazebucket style of photography, I generally don’t give them a second look.

The Japanese weekly magazine Friday is known to me only as the folks who ran “image-betraying” (read: having a private life) pictures that derailed (and in one case, destroyed) the careers of several members of Hello!Project. I’d never seen an issue, and wasn’t aware that they leaned more towards the Celebrity Sleuth model, interested as much in publishing posed shots of wannabe starlets as in stalking big-name stars to ruin them.

I know this now, because I spotted an issue at Kinokuniya when I was up in San Francisco on Sunday, and found a magazine I could not not buy.

more...

Google Maps street view in Japan


The folks at Akihabara News pass on the good word.

Here, for instance, is the deer park in Nara. And here’s Togetsukyou, the bridge that crosses the river in Arashiyama. How about Kyoto Tower, and the much more interesting Tokyo Tower. This could be fun…

[it looks like they went through Akihabara before all the stores opened in the morning.]

Older Witches


Looking at the covers of this two-volume manga series (1, 2), I’m forced to ask, “older than what, precisely?”.

Older Witches, vol 1Older Witches, vol 2

And, yes, I’m quite certain it’s porn. I’ve seen samples of the mangaka’s work.

“Need a clue, take a clue,
 got a clue, leave a clue”