About 45 minutes elapsed between the moment that I first turned this server on and the arrival of the first virus/worm/hacker probes. It was obvious that most of them were looking for Windows-based web servers, so they were harmless to me.
Still, I like to review the logs occasionally, and the sheer volume of this crap was getting annoying. Later, when I raised munitions.com from the dead, I discovered that it was getting more than 30,000 hits a day for a file containing the word “ok”. Worst of all, as I prepare to restore my photo archives, I know that I can’t afford to pay for the bandwidth while they’re slurped up by every search engine, cache site, obsessive collector, Usenet reposter, and eBay scammer on the planet.
Enter PF, the OpenBSD packet filter.(Continued on Page 149)
Once every three months, we sent the whole company home while we tore the computer room apart and did all sorts of maintenance work. During my first quarterly downtime, the top item on my list was installing a new BOSS controller into the Solbourne that was our primary Oracle database server. Like any good database, it needed an occasional disk infusion to keep it happy, and there was no room on the existing SCSI controllers.
So I had a disk tray, a bunch of shiny new disks, a controller card, and media to upgrade the OS with. The BOSS was only supported in the latest version, and this being the server that kept the books, it was upgraded only when necessary.(Continued on Page 1518)
A funny thing happened when we upgraded our servers from Solaris 2.5.1 several years ago: when we killed a process, frequently its parent wouldn’t notice. This was annoying, since a lot of our Operations processes were built around killing and restarting services so they’d notice changes in a controlled fashion.(Continued on Page 1807)
Last July, I knocked together a small perl script to monitor my Apache logs for virus probes, rude robots, and other annoyances, and automatically add their IP addresses to my firewall’s block list.
Today I spotted a very unusual entry at the bottom of my referrer report. I was morbidly curious what someone at a commercial web site devoted to she-males would be linking to, but it turns out the answer is “nothing”. Someone in China was running a robot that pretended to be a Windows 98 box while recursively downloading my site, no doubt to encourage My Loyal Readers (all six of them) to visit this fascinating site.
Unfortunately for my hopeful new friend, his robot tripped my log monitor and triggered a block, preventing him from getting more than a few hits. Even more unfortunately, I don’t display recent referrers anywhere on this site, so I’m the only person who knows what site he’s being paid to direct traffic to.
And I’m not going to tell. But it’s registered to someone named Dmitri Kukushkin in Delaware, who owns at least one other fetish domain.
I hope I am not the first to point out just how pompous and wrong-headed the following statement is:
In Netpbm, we believe that man pages, and the Nroff/Troff formats, are obsolete; that HTML and web browsers and the world wide web long ago replaced them as the best way to deliver documentation. However, documentation is useless when people don’t know where it is. People are very accustomed to typing “man” to get information on a Unix program or library or file type, so in the standard Netpbm installation, we install a conventional man page for every command, library, and file type, but all it says is to use your web browser to look at the real documentation.
Translation: We maintain a suite of tools used by shell programmers, and we think that being able to read documentation offline or from the shell is stupid, so rather than maintain our documentation in a machine-readable format, we just wrote HTML and installed a bunch of “go fuck yourself” manpages.
On the bright side, they wrote their own replacement for the “man” command that uses Lynx to render their oh-so-spiffy documentation (assuming you’ve installed Lynx, of course), but they don’t even mention it in their fuck-you manpages. Oh, and the folks at darwinports didn’t know about this super-special tool, so they didn’t configure it in their netpbm install.
A-baka: “Hey, I know what we’ll do with our spare time! We can reinvent the wheel!”
B-baka: “Good idea, Dick! No one’s ever done that before, and everyone will praise us for its elegance and ideological purity, even though it’s incompatible with every other wheel-using device!”
A-baka: “We’re so cool!”
Update!: it keeps getting better. Many shell tools have some kind of help option that gives a brief usage summary. What do the Enlightened Beings responsible for netpbm put in theirs?
% pnmcut --help pnmcut: Use 'man pnmcut' for help.
[last update: the root cause of the Linux loopback device problem described below turns out to be simple: there’s no locking in the code that selects a free loop device. So it doesn’t matter whether you use mount or losetup, and it doesn’t matter how many loop devices you configure; if you try to allocate two at once, one of them will likely fail.]
Panel discussions at LinuxWorld (emphasis mine):
“We need to make compromises to do full multimedia capabilities like running on iPod so that non-technical users don’t dismiss us out of hand.”
“We need to pay a lot more attention to the emerging markets; there’s an awful lot happening there.”
But to truly popularize Linux, proponents will have to help push word of the operating system to users, panelists said.
… at least one proponent felt the Linux desktop movement needed more evangelism.
Jon “Maddog” Hall, executive director of Linux International, said each LinuxWorld attendee should make it a point to get at least two Windows users to the conference next year…
I’m sorry, but this is all bullshit. These guys are popping stiffies over an alleged opportunity to unseat Windows because of the delays in Vista, and not one of them seems to be interested in sitting down and making Linux work.
Not work if you have a friend help you install it, not work until the next release, not work with three applications and six games, not work because you can fix it yourself, not work if you can find the driver you need and it’s mostly stable, not work if you download the optional packages that allow it to play MP3s and DVDs, and definitely not work if you don’t need documentation. Just work.
[disclaimer: I get paid to run a farm of servers running a mix of RedHat 7.3 and Fedora Core 2/4/5. The machine hosting this blog runs on OpenBSD, but I’m toying with the idea of installing a minimal Ubuntu and a copy of VMware Server to virtualize the different domains I host. The only reason the base OS will be Linux is because that’s what VMware runs on. But that’s servers; my desktop is a Mac.]
Despite all the ways that Windows sucks, it works. Despite all the ways that Linux has improved over the years, and despite the very real ways that it’s better than Windows, it often doesn’t. Because, at the end of the day, somebody gets paid to make Windows work. Paid to write documentation. Paid to fill a room with random crappy hardware and spend thousands of hours installing, upgrading, using, breaking, and repairing Windows installations.
Open Source is the land of low-hanging fruit. Thousands of people are eager to do the easy stuff, for free or for fun. Very few are willing to write real documentation. Very few are willing to sit in a room and follow someone else’s documentation step-by-step, again and again, making sure that it’s clear, correct, and complete. Very few are interested in, or good at, ongoing maintenance. Or debugging thorny problems.
For instance, did you know that loopback mounts aren’t reliable? We have an automated process that creates EXT2 file system images, loopback-mounts them, fills them with data, and unmounts them. This happens approximately 24 times per day on each of 20 build machines, five days a week, every week. About twice a month it fails, with the following error: “ioctl: LOOP_SET_FD: Device or resource busy”.
Want to know why? Because mount -o loop is an unsupported method of setting up loop devices. It’s the only one you’ll ever see anyone use in their documentation, books, and shell scripts, but it doesn’t actually work. You’re supposed to do this:
LOOP=`losetup -f` losetup $LOOP myimage mount -t ext2 $LOOP /mnt ... umount /mnt losetup -d $LOOP
If you’re foolish enough to follow the documentation, eventually you’ll simply run out of free loop devices, no matter how many you have. When that happens, the mount point you tried to use will never work again with a loopback mount; you have to delete the directory and recreate it. Or reboot. Or sacrifice a chicken to the kernel gods.
Why support the mount interface if it isn’t reliable? Why not get rid of it, fix it, or at least document the problems somewhere other than, well, here?
[update: the root of our problem with letting the Linux mount command auto-allocate loopback devices may be that the umount command isn’t reliably freeing them without the -d option; it usually does so, but may be failing under load. I can’t test that right now, with everything covered in bubble-wrap in another state, but it’s worth a shot.]
[update: no, the -d option has nothing to do with it; I knocked together a quick test script, ran it in parallel N-1 times (where N was the total number of available loop devices), and about one run in three, I got the dreaded “ioctl: LOOP_SET_FD: Device or resource busy” error on the mount, even if losetup -a showed plenty of free loop devices.]
Well, at least in the area of configuration, maintenance, and release management, the current version shows its dark roots. Before anyone speaks up, I’ll say that I’m generally happy with using FC5 and RedHat Enterprise on our servers at work, but someone had recommended Ubuntu server as a possible base OS for virtualizing my personal machine with VMWare Server.
It installed correctly, but wouldn’t boot. The solution I located required the following steps:
“Fixed in next release,” supposedly, but between that early-warning sign and some of the obvious eccentricities I tripped over, I don’t think I’ll bother with it.
Mark Shuttleworth, Ubuntu guy, on Linux success:
“If we want the world to embrace free software, we have to make it beautiful. I’m not talking about inner beauty, not elegance, not ideological purity… pure, unadulterated, raw, visceral, lustful, shallow, skin deep beauty.
We have to make it gorgeous. We have to make it easy on the eye. We have to make it take your friend’s breath away.”
It should really be called World Domination 050, because it’s providing remedial education that the student should have had before coming to college, but it’s a start:
Linux on the desktop has been a year or two away for over a decade now, and there are reasons it’s not there yet. To attract nontechnical end-users, a Linux desktop must work out of the box, ideally preinstalled by the hardware vendor.
When somebody with a degree in finance or architecture or can grab a Linux laptop and watch episodes of The Daily Show off of Comedy Central’s website without a bearded Linux geek walking them through an elaborate hand-configuration process first, maybe we’ll have a prayer.
You can’t win the desktop if you don’t even try. Right now, few in the Linux world are seriously trying. And time is running out.
Unfortunately “good” isn’t the same as “ready to happen”. The geeks of the world would like a moonbase too, and it’s been 30 years without progress on that front. Inevitability doesn’t guarantee that something will happen within our lifetimes. The 64-bit transition is an opportunity to put Linux on the desktop, but right now it’s still not ready. If the decision happened today, Linux would remain on the sidelines.
[Update: as usual, those wacky kids on Slashdot just don’t get it.]
Okay, I’m stumped. We have a ReadyNAS NV+ that holds Important Data, accessed primarily from Windows machines. Generally, it works really well, and we’ve been pretty happy with it for the last few months.
Monday, the Windows application that reads and writes the Important Data locked up on the primary user’s machine. Cryptic error messages that decrypted to “contact service for recovering your corrupted database” were seen.
Nightly backups of the device via the CIFS protocol worked fine. Reading and writing to the NAS from a Mac via CIFS worked fine. A second Windows machine equipped with the application worked fine, without any errors about corrupted data. I left the user working on that machine for the day, and did some after-hours testing that night.
The obvious conclusion was that the crufty old HP on the user’s desk was the problem (it had been moved on Friday), so I yanked it out of the way and temporarily replaced it with the other, working Windows box.
It didn’t work. I checked all the network connections, and everything looked fine. I took the working machine back to its original location, and it didn’t work any more. I took it down to the same switch as the NAS, and it didn’t work. My Mac still worked fine, though, so I used it to copy all of the Important Data from the ReadyNAS to our NetApp.
Mounting the NetApp worked fine on all machines in all locations. I can’t leave the data there long-term (in addition to being Important, it’s also Confidential), but at least we’re back in business.
I’m stumped. Right now, I’ve got a Mac and a Windows machine plugged into the same desktop gigabit switch (gigabit NICs everywhere), and the Mac copies a 50MB folder from the NAS in a few seconds, while the Windows machine gives up after a few minutes with a timeout error. The NAS reports:
smbd.log: write_data: write failure in writing to client 10.66.0.151. Error Broken pipe
The only actual hardware problem I ever found was a loose cable in the office where the working Windows box was located.
[Update: It’s being caused by an as-yet-unidentified device on the network. Consider the results of my latest test: if I run XP under Parallels on my Mac in shared (NAT) networking mode, it works fine; in bridged mode, it fails exactly like a real Windows box. Something on the subnet is passing out bad data that Samba clients ignore but real Windows machines obey. The NetApp works because it uses licensed Microsoft networking code instead of Samba.]
[8/23 Update: A number of recommended fixes have failed to either track down the offending machine or resolve the problem. The fact that it comes and goes is more support for the “single bad host” theory, but it’s hard to diagnose when you can’t run your tools directly on the NAS.
So I reached for a bigger hammer: I grabbed one of my old Shuttles that I’ve been testing OpenBSD configurations on, threw in a second NIC, configured it as an ethernet bridge, and stuck it in front of the NAS. That gave me an invisible network tap that could see all of the traffic going to the NAS, and also the ability to filter any traffic I didn’t like.
Just for fun, the first thing I did was turn on the bridge’s “blocknonip” option, to force Windows to use TCP to connect. And the problem went away. I still need to find the naughty host, but now I can do it without angry users breathing down my neck.]
building 'pycurl' extension creating build/temp.macosx-10.3-i386-2.5 creating build/temp.macosx-10.3-i386-2.5/src -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/opt/local/include -I/opt/local/include/python2.5 -c src/pycurl.c -o build/temp.macosx-10.3-i386-2.5/src/pycurl.o unable to execute -DNDEBUG: No such file or directory error: command '-DNDEBUG' failed with exit status 1
After I thought I had a decent script for figuring out what packages Anaconda would install from a Fedora 10 DVD, I decided to test it against reality. Reality made the script cry like a little girl, so it was back to the drawing board.
The problem, simply put, was that I had over-estimated the internal consistency of the data. Here’s what I learned in the process of producing a 100% match between my script and an actual default install of Fedora 10:
At some point, this knowledge will be put to use upgrading my EEE PC from Fedora 9, but now that I can declare victory and stop tinkering with the script for a while, I’m going to go finish the Japanese novel I’m currently working my way through (60 pages down, 200 to go).
What is the result of feeding this expression to a Bash shell?