"Oh, FFS, MasterCook!"

MasterCook, currently at version 15, is still the best recipe management software around, mostly because it supports sub-recipes. Most recipe-database software maintainers will give you blank stares when you mention this, even the ones who claim to import MasterCook format; some of them don’t even know about sub-title support in ingredient lists. While the software has changed hands several times over the past 25 years, functionally it hasn’t changed much since version 6. The licensed cookbooks come and go, but OS compatibility is the most significant improvement. (disclaimer: I haven’t tested the pretty clouds in v15 yet)

There are tens of thousands of recipes on the Internet in the two major MC export formats, MXP and MX2. I recently dug up one of the biggest to play with, which is only available through The Wayback Machine.

MXP is a text file meant to be printed out in a fixed-width font, but the format is well-structured enough that it’s easy to import into other software, with some minor loss of information. If you’ve downloaded any recipes off the Internet in the past 20 years, you’ve probably seen the string “* Exported from MasterCook *”.

MX2, introduced in 1999’s MasterCook 5, is not XML. Yes, it looks like XML, and even has an external DTD schema, but trying to feed it through standard XML tools will trigger explosions visible from half a mile. If you want to work with it, your best bet is the swiss-army-knife conversion tool cb2cb. Windows-only, written in Java, and “quirky”, but it handles both MXP and MX2, as well as some other formats, and has built-in cleanup and merge support. Pity it’s not open source, because I suspect there are dozens of comments with some variation of “Oh, for fuck’s sake, MasterCook!”.

What’s wrong with the “XML” and DTD?

  1. The XML header line is invalid. This:
    <?xml version="1.0" standalone="yes" encoding="ISO-8859-1"?>
    must be changed to:
    <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
  2. The following two characters are not escaped in attributes: "&
  3. Non-printable characters in text (ASCII 2, 5, 17, 18, and 31, in particular).
  4. The mx2.dtd file supplied in every version since 1999 has obviously never been tested, because it is incorrect and incomplete, in several different ways.

Of course, anyone who knows me will correctly guess that I’ve gone to the trouble to fix all of these problems, with a Perl script that massages MX2 into proper UTF-8 XML that validates against a corrected mx2.dtd; part of that script dates back to my old cookbook project from 2002, so yes, this is the first step to reviving that. The script uses xmllint to fix the encoding and double-check that it’s valid XML. I’ve validated over 450 converted MX2 files against the corrected DTD, a total of around 120,000 recipes.

Update: When converting MXP to MX2, many of the options in cb2cb mangle the output. Best to turn them all off, and do some basic cleanup with a script like this one which splits directions on CRLF pairs and safely moves most of the non-direction text into Notes. There are still a few rare errors in the conversion process, but in my case that amounted to 4 ingredient lines in over 10,000 recipes, detected by their failure to validate during the XML conversion.

Comments via Isso

Markdown formatting and simple HTML accepted.

Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.