Japan

Super bad!


Today’s Japanese slang word is 激ヤバ (“gekiyaba”). 激 means violent or intense, and ヤバ comes from やばい, slang for dangerous, terrible, cool, etc. So, “really bad” or “really good”, depending on the context.

The specific context I found it in was the phrase “激ヤバ援交”, with 援交 (“enkou”) abbreviated from the well-known 援助交際 “enjo kousai” (paid dating, also known as “schoolgirl prostitution”). A quick search on Amazon Japan suggests that in sexual contexts, gekiyaba means “extreme”. So, either the young lady in question was willing to do more than usual, or the resulting video had little or no censorship, or perhaps both.

Memory grass and naughty wives


I started out on an innocent quest: find something short and interesting to prep for the upcoming quarter’s Japanese reading class. I still have some leftovers from Spring (a song and the preface to a biography), but I wanted to try something different. I thought a short travel piece would be nice, and when I was visiting my sister in Chicago, I found a 30-year-old tourist guide in a used-book store. It’s a guide to Kyoto, and judging from the ads, it’s aimed primarily at female travelers.

It’s full of short blurbs about neighborhoods, temples, and shrines, and I picked the section on Arashiyama to scan in and prepare a vocabulary list for. At the bottom of the last page, in small, blue print, I found the following footnote:

直指庵では女の子がジッとダマッテ「想出草」を見て何時間も座っているのです。オソロシー!

Vaguely translated, “At Jikishian Temple, girls stay quiet, look at ‘omoidegusa’, and remain seated for many hours. Dreadful!”

Omoidegusa (想出草) does not appear in any of my dictionaries. Literally translated, it would be “memory grass”, but the third kanji is also used to refer to handwritten notes. Using the Japanese search engine goo.ne.jp, I found a few pages that mentioned it in the context of letters written by women, with a hint of confession.

So I searched Amazon Japan, to see where it might turn up. First thing on the list:

more...

Dear Ai Kago,


No, honey; just… no.

(NSFW)

more...

Dear Japan,


No comment:

"We wanted to do something that would market augmented reality in a way that's... meaningful. We were like, wouldn't it be awesome if you could look up her skirt, or take off her clothes?"

(via BBG)

Dear Japan,


Stop, you’re killing me.

Titled "Koisuru Hello Kitty," the play is described as a "school love comedy" that deals with romance and friendship. The main character is a Hello Kitty doll that turns into a human...

Something in the water, perhaps...


There are times when the only joy I can extract from the work of the Hello!Project costume designers is the thought that they’re an aberration, and that outside of Harajuku, most people involved in costume design in Japan at least try to achieve some sort of tasteful enhancement.

Yeah, well, not so much.

more...

Words of Wisdom...


…sort of. So speaks Izumi Kojima, in the song “Ah ~ yokatta”:

There is nothing for us to lose.
Sure, I can say. I can say.
Nobody knows what it means,
"Hung in there!"
But I'll be right beside you from now on.
So on...

Using Abbyy FineReader Pro for Japanese OCR


[Update: if you save your work in the Finereader-specific format, then changes you make after that point will automatically be saved when you exit the application; this is not clear from the documentation, and while it’s usually what you want, it may lead to unpleasant surprises if you decide to abandon changes made during that session.]

After several days with the free demo, in which I tested it with sources of varying quality and tinkered with the options, I bought a license for FineReader Pro 9.0 (at the competitive upgrade price for anyone who owns any OCR product). I then spent a merry evening working through an album where the liner notes were printed at various angles on a colored, patterned background. Comments follow:

  • Turn off all the auto features when working with Japanese text.
  • In the advanced options, disable all target fonts for font-matching except MS Mincho and Times New Roman. Don't let it export as MS Gothic; you'll never find all of the ー/一 errors.
  • Get the cleanest 600-dpi scan you can. This is sufficient for furigana-sized text on a white background.
  • Set the target language to Japanese-only if your source is noisy or you're sure there's no random English in the text. Otherwise, it's safe to leave English turned on.
  • Manually split and deskew pages if the separation isn't clean in the scan.
  • Adjust the apparent resolution of scans to set the output font size, before you tell it to recognize the text.
  • Manually draw recognition areas if there's anything unusual about your layout.
  • Rearrange the windows to put the scan and the recognized text side-by-side.
  • Don't bother with the spell-checker; it offers plausible alternative characters based on shape, but if the correct choice isn't there, you have to correct it in the main window anyway. Just right-click as you work through the document to see the same data in context.
  • You can explicitly save in a FineReader-specific format that preserves the entire state of your work, but it creates a new bundle each time, and it won't overwrite an existing one with the same name. This makes it very annoying when you want to simply save your progress as you work through a long document; each new save includes a complete copy of the scans, which adds up fast.
  • If you figure out how to get it stop deleting every full-width kanji whitespace character, let me know; it's damned annoying when you're trying to preserve the layout of a song.
  • Once you've told it to recognize the text, search the entire document for these common errors:
    • っ interpreted as つ and vice-versa
    • ー interpreted as 一 and vice-versa; check all other nearby katakana for "small-x as x" errrors while you're at it
    • 日 interpreted as 曰
    • Any English-style punctuation other than "!", ":", "…", or "?"; most likely, they should be the katakana center-dot, but it might have torn a character apart into random fragments (rare, unless your background is noisy).
    • The digits 0-9; if your source is noisy, random kanji and kana can be interpreted as digits, even when English recognition is disabled.
  • Delete any furigana it happens to recognize, unless you're exporting to PDF; it just makes a mess in Word.
  • In general, export to Word as Formatted Text, with the "Keep line breaks" and "Highlight uncertain characters" options turned on.
  • If your text is on a halftoned background and you're getting a lot of errors, load up the scan in Photoshop, use the Strong Contrast setting in Curves, then try out the various settings under Black & White until you find one that gets rid of most of the remaining halftone dots (I had good luck with Neutral Density). After that, you can Despeckle to get rid of most of the remaining noise, and use Curves again to force the text to a solid black.

“Need a clue, take a clue,
 got a clue, leave a clue”