Thursday, June 7 2012

Odd layout limitation in Word: columns

Microsoft Word has very good standards-based HTML import. It’s much more capable than OpenOffice, and advanced layout can be set with custom CSS. It’s robust enough that the majority of Word features will survive a round-trip through the HTML exporter. Your brain may explode if you try to edit the exported HTML, which IMHO is only useful for figuring out how to use their custom CSS; for sanity and source-control, write in HTML, layout with Word, print to PDF.

[and only use LaTeX if you need to do tricky things like post-process DVI and AUX files to generate cross-referenced vocabulary lists and delete content without repaginating…]

[Update: an odd limit to the importer is handling indentation of nested lists. At a certain point, it reverts to left-indented]

My current effort involves making detailed notes on Shinkendo techniques as I learn them, and my preferred layout is landscape, two-column, half-inch margins, 14pt Times Roman for English and romanized Japanese, 14pt Hiragino Mincho for kanji and kana, and strategically-placed column breaks so that kata sequences don’t get split up. All of this is trivial to do with Word’s custom CSS, except the column breaks.

Why? Because columns are still second-class citizens in Word. You can’t define a paragraph style that says “start at the top of a new column”. Page, yes; you can do all sorts of styling at the page level, but column breaks have to be inserted individually everywhere they’re used. In the HTML importer, this means they’re mapped to the <BR> tag, which of course adds a gratuitous blank line wherever you use it. For page breaks, you can just add page-break-before:always to any block style, but mso-column-break-before:always only works on <BR> tags. I tried a number of variations, but in the end, the only way to get everything I wanted was to put up with the occasional extra blank line at the top of a column.

<!DOCTYPE html>
<html lang="ja">
<head>
<meta charset="utf-8">
<style>
body {
	font-size:14pt;
	line-height: 18pt;
	font-family:Times,"ヒラギノ明朝 Pro W3";
	mso-fareast-font-family:"ヒラギノ明朝 Pro W3";
	mso-line-height-rule:exactly;
}
br.col {mso-column-break-before:always}
@page Section1 {
	margin:0.5in 0.5in 0.5in 0.5in;
	size:11in 8.5in;
	mso-page-orientation:landscape;
	mso-header-margin:.4in;
	mso-footer-margin:.5in;
	mso-columns:2 even 10.0pt;
	mso-column-separator:solid;
}
div.Section1 {page:Section1}
</style>
</head>
<body>
<div class="Section1">
...
<br class="col">
...
</div>
</body>
</html>

Notes: putting Times and Hiragino Mincho in the font-family tag, in that order, is enough to get normal browsers to use the right fonts; the HTML version is in my Dropbox, and looks great on my laptop, iPhone, and Sony Tablet. Word, unfortunately, tends to default to a Chinese font when it hits the first kanji, and then sticks with it even when it sees hiragana or katakana. Setting mso-fareast-font-family dodges that bullet.