Sysadmin

Tuning OpenBSD…


Hear my words, and know them to be true:

"Adjust the unlabeled knob to the unmarked setting, then press the unspecified button an undocumented number of times. Fixes everything."

I’m only half-kidding. Sadly, it always seems to be worth the effort. This time, it was to replace the CentOS server that kept locking up in less than 24 hours under the stress of incoming syslog data from 80,000+ hosts. Admittedly, syslog-ng is partially at fault, given that it leaks about 20 file handles per minute, but you wouldn’t expect that to cause scary ext3 errors that require a reboot. The BSD ffs seems to be more mature in that regard, although its performance goes to hell fast unless you turn on soft dependencies.

[Update: Oh, and to be fair, I should mention the downside of this, which is simply that adjusting the right knob to the wrong setting (or vice-versa) will kill everyone within thirty yards of the server.]

Dear Vmware,


Come on, really?

"We'd like to keep you informed via email about product updates, upgrades, special offers and pricing. If you do not wish to be contacted via email, please ensure that the box is not checked."

At least the box is not not unchecked by default, but this is stupid.

Arbitrary limits


As a general rule, office firewalls do not have to be configured to cope with simultaneous incoming syslog traffic from 80,000+ hosts. Mine did. Sadly, the default limit for a particular element was only capable of handling about ¾ of that, leaving our outgoing connections somewhere between unstable and “not” when things got busy.

Fixed now.

PS: syslog can be scary efficient at sending packets when a box is unhappy. Enough unhappy boxes makes for a quite impressive DDOS attack, if you haven’t previously discovered that using “no state” in a firewall rule does not, in fact, avoid filling your state table with crap, thus accelerating your approach toward that arbitrary limit.

HexBash


I had a perfectly good reason for doing it this way…

declare id
function hexhex () {
  printf -v id %06X 0x$1
}

Microsoft, Sidekick, Danger


Reading the emerging story about the T-Mobile/Sidekick data loss, I was surprised to discover that this guy isn’t working there. In fact, he’s not even on the West Coast any more, which makes me feel better about all my data.

I have some friends at what used to be Danger, and I know they’ve been working frantically at damage control, but I can only see blood on the walls in this one. Some people screwed up bigtime, years ago, both procedurally and technically, and if the original culprits are gone, their replacements will get axed for not spotting, and removing, the vulnerabilities.

Microsoft can afford the financial hit it’s going to take from this, but the PR hit is devastating. Any product line that says “trust us with your data” is in big trouble.

[why, yes, I did just update and verify my offsite backups; why do you ask?]

It’s sad that this makes me happy…


SQL interface to Perforce.

It’s been around for quite a while, but I’d never noticed it; most of my data-mining has been at levels that can be satisfied with the usual command-line interface. It will come in handy for my branch-to-branch bugfix-integration report, though.

“Show me on this doll where the bad SQL touched you”


I don’t want a database guru. I want a database ogre, who lives in a dank, dark cave lined with the bones of developers who think they can write their own queries and release them to Production.

During my latest round of load-testing, I discovered that one particular client-driven query degrades rather seriously under load. As in, fifteen minutes to use a unique device ID to look up the matching unique customer ID and a single string related to RMA status. Part of the problem was that the dev was looking in the wrong place, but the main problem was that he didn’t understand the data, so the query was written in a way guaranteed to maximize search time. (rant about poor schema design saved for another day…)

I am a SQL caveman. My formal training in database technology began and ended with a single COBOL class in the mid-Eighties. I rewrote the query and dropped the time to 0.062 seconds under the same heavy load.

Four orders of magnitude? Time to feed another dev to the ogre!

(and, yes, the checkin comment attached to this query begins “optimized the sql query for …”. The sad thing is that, relatively speaking, this is a true statement; his previous code was worse)

My new Monday t-shirt

Proof my email works again


In the past seven hours, I have received 490 pieces of spam. One made it to my inbox. One almost made it to my inbox. The rest were caught by SpamSieve, with no false positives.

So, yes, I’m pretty sure that the catchall mailbox at jgreely.com is working again. :-)

I moved one of my parked domains to a new account at Pair, the hosting side of domain registrar PairNIC. They offer clean multiple-domain support with catchall mailboxes and sophisticated filtering, secure IMAP and SMTP, and a full range of scripting languages and libraries under FreeBSD. Once I’ve tested everything out with that domain, I’ll move jgreely.com over, as well as the J-E dictionary I’m currently hosting on jgreely.net.

It will be a while before I can resume the blog upgrade work I started a while back, so dotclue.org won’t move to the Pair account any time soon, and neither will my old high-volume picture site (which survives because of bandwidth-throttling firewall rules). All of the non-blog CGI will probably end up on jgreely.net once that domain is migrated off of the flaky old Shuttle sitting in my closet.