Thursday, August 17 2006

They still don’t get it…

[last update: the root cause of the Linux loopback device problem described below turns out to be simple: there’s no locking in the code that selects a free loop device. So it doesn’t matter whether you use mount or losetup, and it doesn’t matter how many loop devices you configure; if you try to allocate two at once, one of them will likely fail.]

Panel discussions at LinuxWorld (emphasis mine):

“We need to make compromises to do full multimedia capabilities like running on iPod so that non-technical users don’t dismiss us out of hand.”

“We need to pay a lot more attention to the emerging markets; there’s an awful lot happening there.”

But to truly popularize Linux, proponents will have to help push word of the operating system to users, panelists said.

… at least one proponent felt the Linux desktop movement needed more evangelism.

Jon “Maddog” Hall, executive director of Linux International, said each LinuxWorld attendee should make it a point to get at least two Windows users to the conference next year

I’m sorry, but this is all bullshit. These guys are popping stiffies over an alleged opportunity to unseat Windows because of the delays in Vista, and not one of them seems to be interested in sitting down and making Linux work.

Not work if you have a friend help you install it, not work until the next release, not work with three applications and six games, not work because you can fix it yourself, not work if you can find the driver you need and it’s mostly stable, not work if you download the optional packages that allow it to play MP3s and DVDs, and definitely not work if you don’t need documentation. Just work.

[disclaimer: I get paid to run a farm of servers running a mix of RedHat 7.3 and Fedora Core 2/4/5. The machine hosting this blog runs on OpenBSD, but I’m toying with the idea of installing a minimal Ubuntu and a copy of VMware Server to virtualize the different domains I host. The only reason the base OS will be Linux is because that’s what VMware runs on. But that’s servers; my desktop is a Mac.]

Despite all the ways that Windows sucks, it works. Despite all the ways that Linux has improved over the years, and despite the very real ways that it’s better than Windows, it often doesn’t. Because, at the end of the day, somebody gets paid to make Windows work. Paid to write documentation. Paid to fill a room with random crappy hardware and spend thousands of hours installing, upgrading, using, breaking, and repairing Windows installations.

Open Source is the land of low-hanging fruit. Thousands of people are eager to do the easy stuff, for free or for fun. Very few are willing to write real documentation. Very few are willing to sit in a room and follow someone else’s documentation step-by-step, again and again, making sure that it’s clear, correct, and complete. Very few are interested in, or good at, ongoing maintenance. Or debugging thorny problems.

For instance, did you know that loopback mounts aren’t reliable? We have an automated process that creates EXT2 file system images, loopback-mounts them, fills them with data, and unmounts them. This happens approximately 24 times per day on each of 20 build machines, five days a week, every week. About twice a month it fails, with the following error: “ioctl: LOOP_SET_FD: Device or resource busy”.

Want to know why? Because mount -o loop is an unsupported method of setting up loop devices. It’s the only one you’ll ever see anyone use in their documentation, books, and shell scripts, but it doesn’t actually work. You’re supposed to do this:

LOOP=`losetup -f`
losetup $LOOP myimage
mount -t ext2 $LOOP /mnt
umount /mnt
losetup -d $LOOP

If you’re foolish enough to follow the documentation, eventually you’ll simply run out of free loop devices, no matter how many you have. When that happens, the mount point you tried to use will never work again with a loopback mount; you have to delete the directory and recreate it. Or reboot. Or sacrifice a chicken to the kernel gods.

Why support the mount interface if it isn’t reliable? Why not get rid of it, fix it, or at least document the problems somewhere other than, well, here?

[update: the root of our problem with letting the Linux mount command auto-allocate loopback devices may be that the umount command isn’t reliably freeing them without the -d option; it usually does so, but may be failing under load. I can’t test that right now, with everything covered in bubble-wrap in another state, but it’s worth a shot.]

[update: no, the -d option has nothing to do with it; I knocked together a quick test script, ran it in parallel N-1 times (where N was the total number of available loop devices), and about one run in three, I got the dreaded “ioctl: LOOP_SET_FD: Device or resource busy” error on the mount, even if losetup -a showed plenty of free loop devices.]