Friday, February 25 2011

Appending metadata to a PDF file

The Kindle has generally excellent support for reading PDF files, but absolutely terrible support for displaying embedded metadata. If FOO5419.pdf contains properly-specified Title and Author fields, it will appear on your Kindle as, you guessed it, FOO5419. It might show the Author on the right-hand side of the screen, and it might show Title and Author on the detail screen, but likely not.

It will work if you generate PDF version 1.3 with a self-contained Info dictionary (that is, “/Title(My Book)”, but not “13 0 obj (My Book) … /Title 13 0 R”). It will work if you do an append-only update to a v1.3 file in Adobe Acrobat Pro. It will work if you do a rewrite of a v1.3 file with pdftk.

What should work, for all PDF files, is an append-only update that uses only v1.3-ish features to create a self-contained Info dictionary. I hadn’t hacked PDF by hand since 1993, but I dusted off my reference manuals and wrote a script that correctly implements the spec.

It doesn’t work on a Kindle. Acrobat sees my data, Mac OS X Preview sees it, pdftk sees it, and every other tool I’ve tried agrees that my script generates valid PDF files with updated metadata. However, if I use my tool and then ask pdftk to convert the append-only update into a rewrite, the Kindle can see it (but only if it started out as v1.3).

I therefore declare their parser busted. The actual PDF viewer works fine, but whatever cheesy hack they’re using to quickly scan for metadata, it ain’t the good cheese.