Fun with PDFs...


While waiting on a fix for my Perl PDF font woes, I started thinking about some of the other long-standing issues I’ve had with PDF::API2. I actually use PDF::API2::Lite most of the time, with a few extra functions (like clip) merged back in, because I don’t need color spaces, bar codes, metadata, outlines, etc, etc. I just need all of the drawing functions and decent font handling, including proper subsetting in the output file.

Which it doesn’t do. Pro tip: the easiest way to clean up a PDF file full of cruft like giant embedded fonts is to run it through the ps2pdf utility supplied with Ghostscript. Despite the name, it works just fine as a pdf-to-pdf converter and optimizer.

Bottom line, PDF::API2 has been getting staler every year, and the font thing encouraged me to look elsewhere. Phil Perry is taking a stab at it with his fork PDF::Builder, but he’s still feeling his way around, and it’s a big job.

But I’ve written a lot of code over the years, and the PDF APIs I’ve looked at in other languages are either convoluted, limited in function (report writers and form fillers, mostly), just plain missing, and in at least one case commercial, and that generally applies to the other modules I rely on, too, like the Swiss Army Knife of date handling, DateTime.

By happy accident, I discovered that the Cairo library will generate PDF files, and integrates nicely with FreeType for robust font support, and they both have decent Perl modules. Being C libraries, they also have some performance advantages, but the most important thing they have is support.

Documentation and examples are pretty limited for the Perl modules, but I just finished successfully converting my calendar generator to use them, and it was only moderately annoying. The two biggest issues were that Cairo thinks (0,0) is the upper left corner, and that it only tracks one color in its graphics state, compared to PDF’s separate stroke and fill colors. Shimming around these problems added about 55 lines of code. Text positioning is improved thanks to Font::FreeType (and can be improved a lot more by exploiting the API fully), and output with embedded fonts is significantly smaller. Also, all my fonts work.

Now that I have a working example, I’ve started work on a wrapper module that works like PDF::API2::Lite, so that I can convert my old scripts by just editing a few lines at the top. Shouldn’t be too much work, and I’m tentatively calling it PDF::Cairo.

There’s also a Pango module for advanced text layout, but integrating that is definitely a “phase 2” item.

Update

As far as I can tell, Pango offers the exact opposite of what I want, turning font lookup into a game where the penalty for guessing wrong is having all your text rendered in Verdana.


Comments via Isso

Markdown formatting and simple HTML accepted.

Sometimes you have to double-click to enter text in the form (interaction between Isso and Bootstrap?). Tab is more reliable.