Friday, January 3 2003

Need a cookie? Take a cookie.

So, I just got another notice about a sleazy bastard printing out my photographs and selling them on eBay. Joy. This is the sort of behavior that led me to stop posting large JPEGs a while back.

They’ll all be hosted here soon (including the large ones), as part of a completely rewritten jgreely.com, but I plan to take steps to make certain forms of abuse more difficult.

The most common excuse I hear is “I didn’t know they were yours” (second most common is “I thought they were public domain,” which is always a howler coming from a veteran eBay dealer). Digital watermarking does rude things to the image quality and offers no real legal advantages, and visible watermarks are either easy to strip out or just plain ugly. Both help prevent the “innocent infringement” defense, however, so I may include small visible watermarks in the future.

There’s another way to cut down on “innocent infringement,” and that’s to guarantee that the person downloading the images has been presented with a clear statement explaining your ownership and the rights you’re granting. Lots of people think they’re doing this on the web, but they’re mostly just playing security-through-obscurity games, and can be defeated by deep-linking, forged referer headers, or just turning off JavaScript. Extreme cases involving slicing up your images and/or embedding them in Java applets are just moronic.

The other abuse, which contributes a great deal to my eBay problems, is people and ’bots who download the entire contents of your site and either put it up on theirs or sell it on CDROM (several of the allegedly-honest dealers I’ve caught selling my pictures insist they bought them on a “copyright-free” CD; none of them have yet been able to provide any evidence that the CD exists, however).

I want to solve both problems. The solution that I think will work involves an often-hated technology, browser cookies. Since I don’t care who looks at my pictures, simply that they’ve read and acknowledged my rights, the less-invasive session cookies will suffice.

So, let’s say that I’ve got a photo archive stored under the URL /photo. I only care about protecting images that are large enough to be interesting, so thumbnails (size 0) and small previews (size 1) can be outside of the protected space. That is, /photo/0/fish.jpg can be retrieved by anyone without seeing a special copyright page, but it’s only 108x72 pixels, so they can’t do much with it. /photo/2/fish.jpg, on the other hand, is 1200x800 (suitable for full-screen display, half-decent 4x6 prints, or lousy 8x10s), so it triggers the protection.

When a request comes in for any image under /photo/2, the server checks for a session cookie. No cookie? You’re redirected to the rights page, which explains the deal and contains a form to fill in and submit. If you agree, the server sends you a cookie containing a timestamp, your IP address, a random number (to tell folks apart who are behind NAT or proxy servers), and an encrypted string containing a delay counter (initially set to 0). This cookie is cached on the server for later reference.

If you send a cookie with your request, there are a number of possible responses. Here are the failures, which send you to an error page:

  • Non-matching IP address.
  • Timestamp in the future.
  • Decryption failure.
  • Cookie present in cache.
  • Timestamp more than 10 minutes old.

These are successes, which return the requested image and set an updated cookie, possibly waiting several seconds first based on the value of the delay counter:

  • Timestamp more than 1 minute old — decrement the delay counter if it’s greater than 0.
  • Timestamp less than 30 seconds old — increment the delay counter.

The cookie cache expires automatically every few minutes (using Perl Cache::SharedMemoryCache module, most likely), so it shouldn’t get too cluttered. It only exists to keep people from reusing the same cookie to speed up their download (either by hand or with some sort of parallel-download tool), so it doesn’t need to be remembered for very long.

Here’s the fun bit: the delay counter is an exponent. Try to download ten pictures in a row without a reasonable delay, and the tenth picture takes a thousand times as long as the first. Humans will give up long before this happens, and automated tools will either wait forever or choke. Note that for this to work, the timestamp in the cookie has to be computed at the end of the delay period.

There’s at least one special case to consider. If you cancel a request before your browser accepts the updated cookie, your next request for a protected image will send you to the failure page, because your cookie is in the cache. Unless you wait long enough for the cache to expire, you’ll have to go through the rights page again to see more pictures. A bit clunky, but if you pick the timeouts well, there shouldn’t be many false positives.

The other special case is shared or public computers. If someone accepts the rights agreement and then walks away without ending the browser session, the next person to come along would be able to view the protected pictures without ever seeing the agreement. That’s the reason for the ten-minute total lifetime of each cookie. I’d like to make it shorter, but I think that would produce a significant increase in false positives.

The biggest weakness of the scheme is automated form-submission. If someone takes the time to figure out how to process the rights page with a program, they can get an unlimited number of valid zero-delay cookies for the same IP address. Each one can be used only once, but there’s an endless supply. This is a real problem, and the proposed solutions are out of scope for this project.

My workaround is to assume that the site is not currently being slashdotted, and so there won’t be a large number of requests for the rights form. Another simple cache can track recent requests for the form, and insert delays accordingly. These don’t have to grow exponentially, and can probably just be somewhere between 5-15 seconds.

Other than that, I think it’s pretty solid. The cookie payload is protected by strong crypto, so you can’t forge a working cookie that has no delay. You can’t download two protected images at the same time. You can’t reuse an existing cookie that has a small delay. You can’t view any protected images until you’ve received a cookie, and the only way to get one is by accepting the agreement presented on the rights page.

Need a cookie, take a cookie;
got a cookie, leave a cookie.