Unix

They still don't get it...


[last update: the root cause of the Linux loopback device problem described below turns out to be simple: there’s no locking in the code that selects a free loop device. So it doesn’t matter whether you use mount or losetup, and it doesn’t matter how many loop devices you configure; if you try to allocate two at once, one of them will likely fail.]

Panel discussions at LinuxWorld (emphasis mine):

"We need to make compromises to do full multimedia capabilities like running on iPod so that non-technical users don’t dismiss us out of hand."

"We need to pay a lot more attention to the emerging markets; there’s an awful lot happening there."

But to truly popularize Linux, proponents will have to help push word of the operating system to users, panelists said.

... at least one proponent felt the Linux desktop movement needed more evangelism.

Jon “Maddog” Hall, executive director of Linux International, said each LinuxWorld attendee should make it a point to get at least two Windows users to the conference next year...

I’m sorry, but this is all bullshit. These guys are popping stiffies over an alleged opportunity to unseat Windows because of the delays in Vista, and not one of them seems to be interested in sitting down and making Linux work.

Not work if you have a friend help you install it, not work until the next release, not work with three applications and six games, not work because you can fix it yourself, not work if you can find the driver you need and it’s mostly stable, not work if you download the optional packages that allow it to play MP3s and DVDs, and definitely not work if you don’t need documentation. Just work.

[disclaimer: I get paid to run a farm of servers running a mix of RedHat 7.3 and Fedora Core 2/4/5. The machine hosting this blog runs on OpenBSD, but I’m toying with the idea of installing a minimal Ubuntu and a copy of VMware Server to virtualize the different domains I host. The only reason the base OS will be Linux is because that’s what VMware runs on. But that’s servers; my desktop is a Mac.]

Despite all the ways that Windows sucks, it works. Despite all the ways that Linux has improved over the years, and despite the very real ways that it’s better than Windows, it often doesn’t. Because, at the end of the day, somebody gets paid to make Windows work. Paid to write documentation. Paid to fill a room with random crappy hardware and spend thousands of hours installing, upgrading, using, breaking, and repairing Windows installations.

Open Source is the land of low-hanging fruit. Thousands of people are eager to do the easy stuff, for free or for fun. Very few are willing to write real documentation. Very few are willing to sit in a room and follow someone else’s documentation step-by-step, again and again, making sure that it’s clear, correct, and complete. Very few are interested in, or good at, ongoing maintenance. Or debugging thorny problems.

For instance, did you know that loopback mounts aren’t reliable? We have an automated process that creates EXT2 file system images, loopback-mounts them, fills them with data, and unmounts them. This happens approximately 24 times per day on each of 20 build machines, five days a week, every week. About twice a month it fails, with the following error: “ioctl: LOOP_SET_FD: Device or resource busy”.

Want to know why? Because mount -o loop is an unsupported method of setting up loop devices. It’s the only one you’ll ever see anyone use in their documentation, books, and shell scripts, but it doesn’t actually work. You’re supposed to do this:

LOOP=`losetup -f`
losetup $LOOP myimage
mount -t ext2 $LOOP /mnt
...
umount /mnt
losetup -d $LOOP

If you’re foolish enough to follow the documentation, eventually you’ll simply run out of free loop devices, no matter how many you have. When that happens, the mount point you tried to use will never work again with a loopback mount; you have to delete the directory and recreate it. Or reboot. Or sacrifice a chicken to the kernel gods.

Why support the mount interface if it isn’t reliable? Why not get rid of it, fix it, or at least document the problems somewhere other than, well, here?

[update: the root of our problem with letting the Linux mount command auto-allocate loopback devices may be that the umount command isn’t reliably freeing them without the -d option; it usually does so, but may be failing under load. I can’t test that right now, with everything covered in bubble-wrap in another state, but it’s worth a shot.]

[update: no, the -d option has nothing to do with it; I knocked together a quick test script, ran it in parallel N-1 times (where N was the total number of available loop devices), and about one run in three, I got the dreaded “ioctl: LOOP_SET_FD: Device or resource busy” error on the mount, even if losetup -a showed plenty of free loop devices.]

Dear netpbm maintainers,


I hope I am not the first to point out just how pompous and wrong-headed the following statement is:

In Netpbm, we believe that man pages, and the Nroff/Troff formats, are obsolete; that HTML and web browsers and the world wide web long ago replaced them as the best way to deliver documentation. However, documentation is useless when people don't know where it is. People are very accustomed to typing "man" to get information on a Unix program or library or file type, so in the standard Netpbm installation, we install a conventional man page for every command, library, and file type, but all it says is to use your web browser to look at the real documentation.

Translation: We maintain a suite of tools used by shell programmers, and we think that being able to read documentation offline or from the shell is stupid, so rather than maintain our documentation in a machine-readable format, we just wrote HTML and installed a bunch of “go fuck yourself” manpages.

On the bright side, they wrote their own replacement for the “man” command that uses Lynx to render their oh-so-spiffy documentation (assuming you’ve installed Lynx, of course), but they don’t even mention it in their fuck-you manpages. Oh, and the folks at darwinports didn’t know about this super-special tool, so they didn’t configure it in their netpbm install.

A-baka: “Hey, I know what we’ll do with our spare time! We can reinvent the wheel!”

B-baka: “Good idea, Dick! No one’s ever done that before, and everyone will praise us for its elegance and ideological purity, even though it’s incompatible with every other wheel-using device!”

A-baka: “We’re so cool!”

Update!: it keeps getting better. Many shell tools have some kind of help option that gives a brief usage summary. What do the Enlightened Beings responsible for netpbm put in theirs?

%  pnmcut --help
pnmcut: Use 'man pnmcut' for help.

Assholes.

It smells like... victory


Last July, I knocked together a small perl script to monitor my Apache logs for virus probes, rude robots, and other annoyances, and automatically add their IP addresses to my firewall’s block list.

Today I spotted a very unusual entry at the bottom of my referrer report. I was morbidly curious what someone at a commercial web site devoted to she-males would be linking to, but it turns out the answer is “nothing”. Someone in China was running a robot that pretended to be a Windows 98 box while recursively downloading my site, no doubt to encourage My Loyal Readers (all six of them) to visit this fascinating site.

Unfortunately for my hopeful new friend, his robot tripped my log monitor and triggered a block, preventing him from getting more than a few hits. Even more unfortunately, I don’t display recent referrers anywhere on this site, so I’m the only person who knows what site he’s being paid to direct traffic to.

And I’m not going to tell. But it’s registered to someone named Dmitri Kukushkin in Delaware, who owns at least one other fetish domain.

Latchkey Zombies in Solaris


A funny thing happened when we upgraded our servers from Solaris 2.5.1 several years ago: when we killed a process, frequently its parent wouldn’t notice. This was annoying, since a lot of our Operations processes were built around killing and restarting services so they’d notice changes in a controlled fashion.

more...

A perfectly reasonable panic


Once every three months, we sent the whole company home while we tore the computer room apart and did all sorts of maintenance work. During my first quarterly downtime, the top item on my list was installing a new BOSS controller into the Solbourne that was our primary Oracle database server. Like any good database, it needed an occasional disk infusion to keep it happy, and there was no room on the existing SCSI controllers.

So I had a disk tray, a bunch of shiny new disks, a controller card, and media to upgrade the OS with. The BOSS was only supported in the latest version, and this being the server that kept the books, it was upgraded only when necessary.

more...

Sanitizing Apache with PF


About 45 minutes elapsed between the moment that I first turned this server on and the arrival of the first virus/worm/hacker probes. It was obvious that most of them were looking for Windows-based web servers, so they were harmless to me.

Still, I like to review the logs occasionally, and the sheer volume of this crap was getting annoying. Later, when I raised munitions.com from the dead, I discovered that it was getting more than 30,000 hits a day for a file containing the word “ok”. Worst of all, as I prepare to restore my photo archives, I know that I can’t afford to pay for the bandwidth while they’re slurped up by every search engine, cache site, obsessive collector, Usenet reposter, and eBay scammer on the planet.

Enter PF, the OpenBSD packet filter.

more...

“Need a clue, take a clue,
 got a clue, leave a clue”