A while back, I upgraded my laptop by replacing the DVD with a 240GB SSD. This has been very, very nice, and gave me just shy of 750GB of disk space, a third of it silly-fast.
So naturally I couldn’t resist replacing the 500GB Seagate hybrid with a Western Digital 750 GB 7200rpm drive, giving me just shy of a Terabyte. And I carry another Terabyte around in the form of a WD hardware-encrypted USB drive.
If this future we live in had flying cars and catgirls, it would be perfect.
Amusingly, despite the fact that this laptop (and its daily backups…) is the center of my electronic universe, I will likely not be taking it to Japan with me at the end of March. My sister and I are only going to be in Kyoto for a week, and time spent in the hotel is time wasted. I’ll take my little Win7 netbook (can VPN to work in an emergency) and my Kindle (v3 has kanji support, and 3G Whispernet works all over Japan), but the bulk of the weight in my carry-on will consist of cameras and lenses.
The Kindle is the reason for several of my seemingly-unrelated recent entries and sidebar links, by the way, including an upcoming discussion of my grand kit-bashing project that mixes Aozora Bunko, MeCab, JMdict, MongoDB, pLaTeX, dviasm, and pdftk, welded together with a few hundred lines of Perl to produce ebooks with personalized levels of furigana and matching per-page vocabulary lists. More on that soon.
In addition to Aozora’s out-of-copyright literature, it’s easy to find much more contemporary work marked up in their format. One should of course only download such things if one is already in legitimate possession of the printed book, but once that hurdle is cleared, my version will be much easier to work through. The scripts can take a complete light novel from raw text to completed PDFs in about 10 seconds, and it only takes a few passes to find all of the unusual vocabulary and definitions, so my reading speed will be improving quite a bit soon.
Which is good, because, as I said, I’m finally going back to Japan!
Since a new version of the free-as-in-fork LibreOffice package was just released, I thought I’d take a look and see if it’s gotten any easier to import formatted text.
The answer: “kinda”.
Good: It imports simple HTML and CSS.
Bad: …into a special “HTML” document type that must be exported to disk in ODT format, and then reopened. Otherwise, all formatting not available for web use will either disappear from all menus and dialog boxes, silently fail, or be deleted when you save (generally the result of pasting from another document).
[note that the Mac version crashed half a dozen times as I was exploring these behaviors, but it usually managed to open the documents on the second try]
Sadly, furigana are not considered compatible with HTML, so they’re stripped on import, making it rather a moot point that you can’t edit them in HTML mode. The only way to import text marked up with furigana is to generate a real XML-formatted, Zip-archived ODT file.
Just spent a merry, no wait, hellish few hours fighting to get a LaTeX distribution up and running for the sole purpose of running a single script that uses it to convert marked-up Japanese text to PDF in convenient ebook sizes.
I failed. Or, more precisely, I got all the way to a DVI file that could be displayed quite nicely on screen, with all the kanji and furigana intact, but then the PDF converter that was part of the same TeX package that had generated it started barfing all over my screen, and I refused to spend more time on the project. I simply have no desire to navigate the layers and layers and layers of crap that TeX has acquired in its hacked-together support for modern fonts and encodings.
Honestly, if I want to generate cleanly-formatted Japanese text as a PDF, with furigana and vertical layout and custom page sizes, it takes 10,000 times less effort to spit out bog-standard HTML+CSS and feed it to Microsoft Word.
[Note to the MS-allergic: performing the equivalent import into OpenOffice is possible, but not reasonable. Getting basic unstyled plaintext+furigana wasn’t too bad, but anything more complicated would be an exercise in tedious XML debugging.]
[Update: gave it another go, and eventually discovered that running dvipdfmx with KPATHSEA_DEBUG=-1 in the environment returned a completely different search path than the kpsewhich tool used. Copying share/texmf/web2c/texmf.cnf.ptex to etc/texmf/texmf.cnf made all the problems go away. At least until the next time I upgrade something in MacPorts that recursively depends on something that obsoletes a recursive dependency of pTeX and hoses half my tools.
And, no, I can’t use the self-contained and centrally-managed TeX Live distribution (or the matching GUI-enabled MacTeX). That was the first hour I wasted. Its version of pTeX is apparently incompatible with one of the style files I needed.]
For a slightly-early birthday present to myself, as part of a post-Thanksgiving sale I bought myself an OWC Data Doubler w/ 240GB SSD. After making three full backups of my laptop, I installed it, and have been enjoying it quite a bit. This kit installs the SSD as a second drive, replacing the optical, allowing you to use it in addition to the standard drive, which in my case is a 500GB Seagate hybrid. I’ve set up the SSD as the boot drive, with the 500GB as /Users/Shared, and moved my iTunes, iPhoto, and Aperture libraries there, as well as my big VMware images and an 80GB Windows partition.
[side note: the Seagate hybrid drives do provide a small-but-visible improvement in boot/launch performance, but the bulk of your data doesn’t gain any benefit from it, and software updates erase the speed boost until the drive adjusts to the new usage pattern. Dual-boot doesn’t help, either. An easy upgrade, but not a big one, IMHO.]
Good:
Bad:
So, file this little experiment under “expensive but worth it”. I do watch DVDs on my laptop, but only at home or in hotels, so the external drive isn’t a daily-carry accessory. The SSD has a Sandforce chipset and 7% over-provisioning, and is less than half full, so there’s no sign of performance degradation, and I don’t expect any. Aperture supports multiple libraries, so I can edit fresh material on the SSD, then move it to the hard drive when I’m done with it. Honestly, unless Apple releases MacBook Pro models that wil take more than 8GB of RAM, I really see no need to buy a new one for quite a while.
When you send me a SQL statement that updates a 600,000-record table based on a join to a 900,000-record table, please make sure there are indexes involved. Also, please don’t test on a toy database.
Just got a complaint from a user about a Perl script that wasn’t handling regular expressions correctly. Specifically, when he typed:
ourspecial-cat | grep 'foo\|bar'
he got a match on “foo” or “bar”, but when he typed:
ourspecial-grep 'foo\|bar'
he got nothing at all.
My surprise came from the fact that the normal grep worked, when everyone knows that you need to use egrep for that kind of search, and in any case, since the entire regular expression was in single-quotes, you don’t need the backslash. Removing the backslash made our tool do what he wanted, but broke grep.
Sure enough, if you leave out the backslash, you need to use egrep or grep -E, but if you put it in, you can use grep. What makes it really fun is that they’re the same program, GNU Grep 2.5.1, and running egrep should be precisely the same as running grep -E.
Makes me wonder what other little surprises are hidden away in the tools I use every day…
Three basic rules to keep in mind when trying to index that massive crapload of data you just shoved into MongoDB:
The current generation of S12 with the ION graphics chipset has been discontinued, with all remaining inventory now in the Lenovo Outlet Store for $399. I’ve been quite happy with mine. The ION gives it decent performance for HD video and light gaming, and it has a full-sized keyboard and bright, crisp screen with decent resolution.
[Update: they also have hundreds of brand-new power supplies for $16. For that price, I can have one at home, one at the office, and one in the trunk of the car, and never carry one around. They also have a hundred or so of the 10-inch netbooks in a major scratch-and-dent sale ($220), and some refurbished 10-inch tablet netbooks]