My baker’s percentage script has reached the point of being quite useful, but the code is currently a mess because I was exploring the problem space as I went (“code-doodling”). The grams-to-volume conversion has shaped up nicely, after the initial hurdle of dealing with the limitations of floating-point numbers; I’d input “2 pounds”, convert it to grams, and then convert it back to “1 pound 16 ounces”. After a few failed fudge factors, I decided to simply multiply the weight by 1.0001, which is just enough to flip the right bits without skewing the results.

My solution to the excessive-precision problem (“1/3 cup + 1 tbsp + ½ tsp + 1/16 tsp”) was to cut it off when the residue is less than 1/16 of the total weight. That is, if you need a full cup of something, you don’t care about adding something smaller than a tablespoon. I’ve considered adding “scant”, “rounded”, and “heaping” modifiers, but then I’d have to track which ingredients are liquid, because a heaping tablespoon of olive oil is… “messy”. I also decided that 1/16 tsp is so tiny that it’s not worth printing unless it’s the only measure (which means it usually only shows up for strong powdered ingredients or scaled-down recipes).

Why convert back to volume measurements in the first place? Because they’re a lot faster, and at home bread-baking sizes, only the flour really has to be weighed for consistent results. Bulky ingredients like seeds, nuts, raisins, or chocolate chips should be weighed, but won’t break the recipe if they’re off a bit. Water and milk are fine if you actually have decent measuring cups (I replaced my hit-or-miss glass ones with OXO’s squeezable silicone cups, which are accurate and pour better).

The three remaining features I want to add are conversion to/from tangzhong/yudane, scaling to preset sizes like “six large hamburger buns”, and slightly tweaking relative proportions while keeping the total weight constant, so you don’t end up with something like 1.2 eggs. The last one is the hardest, because it breaks the currently linear flow of the script, so I’ll have to create some objects and methods to encapsulate everything.

Then comes the web version, which will initially just be a standard POSTed form with pulldown menus for ingredients. Kind of messy, since I’m up to 100 distinct ingredients, and I’ll need an option for custom ingredients. I could go all AJAX-y on it, but I’m getting better at recognizing epicycles before I start working on them.

Here’s sample output for King Arthur Flour’s Japanese Milk Bread, that uses a tangzhong starter for softness and improved shelf life. They’re definitely soft, although I’ve never had a batch last long enough to test shelf life…

  15.0g bread flour (2 tbsp) -- tangzhong
  44.4g water (3 tbsp) -- tangzhong
  45.0g whole milk (3 tbsp) -- tangzhong
 120.0g whole milk (1/2 cup)
  56.7g unsalted butter, melted (4 tbsp)
  50.0g large egg (1)
  17.5g baker's dried milk (2 tbsp)
  49.5g sugar (1/4 cup)
   6.0g salt (1 tsp)
 300.0g bread flour (2 1/2 cup)
   9.3g instant yeast (1 tbsp)

Type         Grams  Baker's Percentage
----         -----  ------------------
TOTAL        713.5  226.5%  1 pound 9 ounces
flour        315.0  100.0%
water        236.6   75.1%
salt           6.0    1.9%
yeast          9.3    3.0%  (>1.5% too much?)
fat           55.4   17.6%
sugar         49.5   15.7%  sweet (use sugar-tolerant yeast)
egg           50.0   15.9%
tangzhong    104.4   33.1%  4.8% of flour, 1:5.6 ratio

You can see from the comments that they significantly increased the yeast to compensate for the high sugar content. Also, despite the high hydration, this isn’t a sticky dough, nor does it produce the sort of airy, irregular crumb that you’d expect, because much of the extra liquid is captured in the tangzhong (which is the whole point).

The Respun Marches

As vaguely promised, here’s what happens when I take the sector data for The Spinward Marches and run it through my scripts to create a completely new sector.

But first, a slight tweak: the Importance stat I mentioned ranges from -3 to 5, which is sufficient for my basic goal of separating the wheat from the chaff, but I decided it doesn’t have enough granularity to truly spread the best systems around the sector, so I multiplied it by another derived stat, Resource Units (“stuff to grab”), which ranges from rather negative to quite positive. Avoiding zeroes for both, the exact scoring method used was: ($ix + 4) * ($ru||1)

(yes, the Importance stat is called “ix”, or more precisely {Ix}, because Marc Miller apparently has an affection for chartjunk; there’s also an (Ex), and a [Cx], and if he ever writes another version of Traveller, I imagine it will have <Xx>, ‹Yx›, and «Zx» as well)

Now, let’s respin The Spinward Marches, with data and PDFs courtesy of The Traveller Map!


PDF::Cairo update

New version of PDF::Cairo uploaded to CPAN and Github. Mostly bug fixes uncovered by automated testing on Linux/BSD distros that have really old libraries, but also new autosize() and extents() text methods, as well as the hello-my-name-is script example (still fiddling with the command-line option processing on that one…).

Fun with software testing

Before I released PDF::Cairo, I gave it a basic test suite to make sure that everything at least loaded correctly. I wanted to fully exercise the various methods, but that wouldn’t tell me if they actually worked or not. I needed to put ink on the page, and then compare it with a reference page. Automatically, using tools likely to be available (or at least easily installed) on the target platforms. I decided to generate an N-page PDF with small pages, draw a single test per page, and convert the results to a series of PNG files; if they were byte-for-byte identical to the files generated from the reference PDF, the tests passed.

ImageMagick’s convert utility was out, even though I currently use it to import non-PNG images into PDF::Cairo, because it can choose between multiple PDF backends, and I didn’t want the added test complexity. I considered Ghostscript, which I have a few decades of practice with, but then I came across Poppler’s pdftocairo utility. Not only does it have the functionality I need, but it’s built on Cairo, FreeType, and Fontconfig, the same libraries I’m generating the PDFs with in the first place.

Here’s what it looks like:

my $pdf = PDF::Cairo->new(
    width => in(2),
    height => in(2),
    file => $OUT,
push(@test_desc, "bezier curves");
$pdf->move(10, 10);
$pdf->curve(20, 120, 40, 40, 140, 60);
$pdf->rel_curve(-10, 10, -40, 40, -100, -30);
$pdf->linedash([12, 4, 8], 2);
  my $tmp = `pdftocairo -v 2>&1` || '';
  skip("need poppler's pdftocairo to compare images")
    unless $tmp =~ /pdftocairo/;
  my $PDFTOCAIRO = "pdftocairo -png -r 200 -antialias gray";
  system("$PDFTOCAIRO t/02-cairo.pdf $TMP/ref");
  system("$PDFTOCAIRO $OUT $TMP/02");
  foreach my $i (1..@test_desc) {
    $i = sprintf("%02d", $i);
    my $test = "page $i: " . shift(@test_desc);
    subtest $test => sub {
      plan tests => 3;
      ok(-s "$TMP/ref-$i.png", "reference page non-empty?");
      ok(-s "$TMP/02-$i.png", "page non-empty?");
      ok(compare("$TMP/02-$i.png", "$TMP/ref-$i.png") == 0,
        "page matches reference?");

Works quite nicely on Mac and Linux, and the small page size speeds up the PNG conversion. To test text methods, I had to add a free TTF font from Google Fonts. I can’t really test the effect of Fontconfig font substitution, which also means that I can’t really test Pango font-handling; currently, the work-in-progress Pango tests only run if you install the fonts included in the tarball.

Naturally, the act of writing the tests smoked out half a dozen bugs, so once I had decent coverage of the main module, I pushed out a new release to Github and CPAN.

PDF::Cairo progress

No complaints on PrePAN, and I’ve done a lot of cleanup, bug-fixing, and documentation, along with creating an ugly-but-useful new example. Testing the various features together has shaken out several bugs and identified some missing API features that I’ve added.

I figure I’ll poke at it some more over the weekend, then create a 1.02 tarball and officially release it on CPAN.

Dear Perl maintainers,

Many years ago, I took the advice of the perlunicode manpage and defined a custom regular-expression class for identifying strings that only have kana and not kanji with the \p{...} construct:

sub InKana {
	return <<END;

This (and all the other methods in the docs) stopped working somewhere between 5.20.2 and 5.28.0, and there is not even the slightest hint anywhere as to why.

Why? Because there is now a predefined InKana class, not equivalent to the above definition, and the above code is simply ignored. You have to change the name to something, anything else; I chose InHirakata, and my code started working again.

And that’s an hour of my life I want back.

Bug filed; RT #134146.

PDF::Cairo unleashed!

On Github now, and up for review on PrePAN.

I made sure to install it and run the examples at least once on a different OS. 😁


Looks like if I mix Image::CairoSVG with the Recording surface support I added recently, I can get most simple SVG files to render correctly. It’s pretty simple-minded, not supporting text or filters, but I can live with that. And there’s not much to it, so I could enhance it and send a pull request.

Epicycles: poster-making

So, as I prepare to inflict PDF::Cairo on the world, I’ve been sensibilizing the documentation and adding example code. My latest effort was dusting off my 15-year-old Traveller sector map generator and cleaning up the hex-grid calculations to use my Box library and the regular-polygon method I recently added. Works great.

Then I had an idea for Yet Another Epicycle: tiling a large image across multiple pages. I did this by hand in the original sec2pdf script, splitting sector maps across 4 or 16 pages, but it would be cool to render the entire sector once and then simply ‘cut out’ the pieces I need for each page. It would also get rid of a lot of special-case code for handling things that are partially outside of the current page.

I tried it out by hand, and it worked great. I drew a big box on a recording surface that was 17x22, and replayed it four times onto 8.5x11 pages with different offsets. Not lightning fast, since you end up rendering the entire drawing N+1 times, but it worked.

For that very simple example.

Unfortunately, it looks like there’s internal state that doesn’t get reset completely, so that when I integrated it into the module and told it to make a really big hex grid, only the first replay was 100% successful. The rest were incomplete (one row/column of hexes, with no text labels). It didn’t matter what order I replayed them in.

Cairo apparently has scripting surfaces and proxy surfaces in recent versions, but no one has updated the Perl module to expose those, and I’m not sure they’d help (scripting surfaces are basically a debugging tool that writes to disk and gets replayed with a standalone CLI tool).

I still want to do it, so I may simply add a simple-minded recording of my API calls and do a high-level replay. Even more overhead than using a recording surface (especially for things like loading fonts and images), but then I’ll know it can be replayed more than once.

Meanwhile, the recording()/replay() methods work at least once, so I’m leaving them in for now.


Wow, that was an easy fix. After walking away for a few hours, I went back and added a single cairo_surface_flush() right before each set_source_surface() call. Renders perfectly, as many times as I want. See?; I just need to tweak the clipping regions a bit, to capture the overlap between pages.

Performance is actually pretty good, now that I have something to measure: it took only 28% longer to generate the drawing and replay it four times on different pages. Output size was 146% larger than generating the same drawing on a single large sheet, but that’s a small price to pay for the convenience.

If I were really going to revive sec2pdf (and I might, but not this weekend!), what I’d do to get the most efficient output is use a recording to draw all of the sector-wide elements (regional borders, X-boat routes, etc) and replay them onto the individual pages, after drawing each page’s grid but before adding text and system data. That would significantly reduce the amount of wasted drawing outside each page’s boundaries, and only repeat the elements that might appear on multiple pages.

“Need a clue, take a clue,
 got a clue, leave a clue”