Wednesday, March 17 2004

iEatBrainz: not ready for prime time

The idea is sound: identify music by acoustic fingerprints instead of relying on the clumsy CD-hashing approach used by CDDB, which not only produces a lot of collisions, but relies on the presence of the CD in a compatible drive. The poor quality of the CDDB database is a separate issue, one that MusicBrainz doesn’t obviously solve.

iEatBrainz is a beta Mac client for the MusicBrainz database, and if the results it produces are representative of the fingerprinting technology and current database, it’s not worth my time right now.

For a simple test, I fed it a 97-song playlist out of my iTunes library, consisting of tracks ripped to MP3 and AAC from my CD collection over the past few years.

4 songs were identified incorrectly, with no alternatives available:

  • “The Scotsman” as Aaliyah’s “Try Again”,
  • “Girl Fight Tonight!” as Redman’s “So Ruff”,
  • “Mean Green Mother From Outerspace” as “Sominex/Suppertime II” (which is the correct title on the Broadway cast album, but not the movie soundtrack),
  • “Poor Unfortunate Souls” is listed as being from a Disney compilation, not the original soundtrack album.

Since other tracks from all four albums were correctly identified, the lack of alternatives suggests a problem in the beta client. It’s either not getting hash collisions from the database, or there’s a problem with the code used to display them.

2 songs returned “Error generating acoustic fingerprint”. I don’t know what this means, but for the record, they were Cindy Lee Berryhill’s “Baby Should I Have The Baby” and Andy M. Stewart & Manus Lunny’s “At It Again”.

14 songs returned “Not in MusicBrainz Database”. Some of those are no surprise (Chloe Liked Olivia, Sunshine ‘n’ Water, and Pop Goes The World are relatively uncommon albums, for instance), but in most cases, other tracks from the same albums were correctly identified. That suggests a problem in either the fingerprinting or the matching algorithms.

23 songs returned “Needs to be looked up”, which suggests a client problem. That leave 54 correct matches, and here at least I can say something positive: the data quality was better than CDDB. No spelling errors, no novel capitalization, no misused fields, and consistent identification of multi-CD releases.

Unfortunately, a 56% success rate with 4% false positives and at least 5% false negatives effectively rules out this tool for retagging the 6000+ songs in my iTunes library, especially since it takes about 30 seconds per song. Worth keeping an eye on, though.