Sysadmin

Connect:Direct for Dummies


I’ve been roped into supporting a project that requires the use of Connect:Direct to transfer data to an external partner. This product is vastly overcomplicated for the use we’re putting it to, and the documentation feels like it was written as an ad for the vendor’s training courses.

I have no interest in becoming an expert Connect:Direct administrator. I want to do two things: configure the Unix command-line client to connect to our partner’s server, so that we can send a file to them, and configure the Unix server so that the partner can connect to us and send the processed data back.

This is turning out to be surprisingly difficult to do. A lot of it is the documentation, but a disturbing percentage of the problem is the near-total lack of information available from our partner. You’d think that a large company that required their customers to purchase and set up a specific software package (that they had no other use for) would supply a one-page cheat-sheet, but these folks haven’t even managed to cough up the userid and password we’re supposed to connect with. For more fun, they say there’s a guy in their security department who knows all about Connect:Direct, but he’s not allowed to talk to external customers.

So, anybody have a friend who knows something about this stuff? Bonus points if you can guess the name of the company we’re trying to connect to. :-)

Update: After giving up on their documentation and our partner’s knowledge pool, I opened a support case with the vendor. Their tech support called my office at 6am this morning, not realizing what time zone I was in. Fortunately, my office phone forwards to my cell, and I was actually awake at the time. Five minutes later, I not only had the original error message deciphered (XSMG242I, which can mean any of “bad permissions”, “config-file syntax error”, “missing remote record for local user”, and others), but had an understanding of their security and connection models that could not be obtained from their documentation. Thank you, Moniram. When our contact at the partner woke up a few hours later, we were able to successfully test file transfers in both directions.

I mentioned my intention to clean up my notes into a “Connect:Direct for Dummies” guide that could be used to rebuild our servers if I were unavailable, and our partner has expressed an interest in acquiring a copy. They’d like to help other customers cut down the setup time from weeks to minutes…

It smells like... victory


Last July, I knocked together a small perl script to monitor my Apache logs for virus probes, rude robots, and other annoyances, and automatically add their IP addresses to my firewall’s block list.

Today I spotted a very unusual entry at the bottom of my referrer report. I was morbidly curious what someone at a commercial web site devoted to she-males would be linking to, but it turns out the answer is “nothing”. Someone in China was running a robot that pretended to be a Windows 98 box while recursively downloading my site, no doubt to encourage My Loyal Readers (all six of them) to visit this fascinating site.

Unfortunately for my hopeful new friend, his robot tripped my log monitor and triggered a block, preventing him from getting more than a few hits. Even more unfortunately, I don’t display recent referrers anywhere on this site, so I’m the only person who knows what site he’s being paid to direct traffic to.

And I’m not going to tell. But it’s registered to someone named Dmitri Kukushkin in Delaware, who owns at least one other fetish domain.

Latchkey Zombies in Solaris


A funny thing happened when we upgraded our servers from Solaris 2.5.1 several years ago: when we killed a process, frequently its parent wouldn’t notice. This was annoying, since a lot of our Operations processes were built around killing and restarting services so they’d notice changes in a controlled fashion.

more...

Today is a good day


New PowerBooks are out. Must wet pants with joy. They all look good, but I’m leaning slightly toward a 15” model with an 80GB disk and 1GB of RAM; not sure I’m ready for a 17” boat anchor.

Yesterday, on the other hand, was definitely not a good day. For some time now, I’ve been installing Panther betas on my iBook with the Archive & Install option, which preserves almost all of my applications and customizations while completely replacing the OS. I’ve always backed up my home directory first, but haven’t bothered with an extra full backup. Cuts the total upgrade time down to about an hour, most of which is spent watching the disks spin.

On another day, I’d consider including a comparison to my last Windows upgrade horror story. Unfortunately, things went terribly wrong this time. Twelve hours later, my iBook is almost back to normal.

more...

I love this kind of bug...


People often wonder what sysadmins do for a living. It’s a mostly-invisible profession, where you’re only noticed when things aren’t working. Mostly we solve problems, but often we first have to figure out what the problem really is.

I don’t want to know how long it took someone to get from “my password doesn’t work” to this:

If you used Open Firmware Password utility to create a password that contains the capital letter "U", your password will not be recognized during the startup process (when you try to access Startup Manager, for example).

Note that it applies to Mac models going back several years, but wasn’t posted on the support site until this week. No doubt there’s a small pile of bug reports that have been sitting around for all this time, with their status field set to “WTF?”.

"If I could get in there..."


While watching yet another Slashdot thread dissolve into a poor imitation of a Usenet flame-war, the smug arrogance of people who think that running Linux means they’re smarter than Windows users reminded me of something that happened when I was at Synopsys.

A widely-used Unix server had crashed, and the engineers were hanging out near the data center, waiting for us to bring it back up.

"What's taking them so long? We've got work to do! Dammit, if I could get in there, I'd fix it myself!"

"I'm pretty sure that's why you can't get in there."

A perfectly reasonable panic


Once every three months, we sent the whole company home while we tore the computer room apart and did all sorts of maintenance work. During my first quarterly downtime, the top item on my list was installing a new BOSS controller into the Solbourne that was our primary Oracle database server. Like any good database, it needed an occasional disk infusion to keep it happy, and there was no room on the existing SCSI controllers.

So I had a disk tray, a bunch of shiny new disks, a controller card, and media to upgrade the OS with. The BOSS was only supported in the latest version, and this being the server that kept the books, it was upgraded only when necessary.

more...

The Perl Script From Hell


I’ve been working with Perl since about two weeks before version 2.0 was released. Over those fifteen years, I’ve seen a lot of hairy Perl scripts, many of them mine.

None of them can compare to the monster that lurks in the depths of our service, though. Over 8,000 lines of Perl plus an 8,000-line C++ module, written in a style that’s allegedly Object Oriented, but which I would describe as Obscenely Obfuscated (“Hi, Andrew!”).

We have five large servers devoted to running it. Each contributes three CPUs, three gigabytes of memory, and 25 hours of runtime to the task (independently; we need the redundancy if one of them crashes). Five years ago, I swore a mighty oath to never, ever get involved with the damned thing.

Then it broke. In a way that involved tens of thousands of unhappy customers.

more...

“Need a clue, take a clue,
 got a clue, leave a clue”