Sysadmin

Definition of a good time


You’ve spent the past two weeks being yelled at by a user for not getting their external partner’s incoming connection to work

and you’ve had a tcpdump running for an entire week showing that no connection attempts have been made from the IP addresses the partner provided

and they schedule a conference call at a time that’s convenient for the partner’s third-world contractors

and they confirm their IP addresses in the chat but the test fails again

and your tcpdump shows them coming in from a completely different IP address

and they start to wrap up the meeting saying they’ll contact their network team who hadn’t been invited and reschedule for the next day

and you have to yell into the microphone to tell them to try again right now since you’ve just added their real IP address to the firewall

and they confirm that it works but continue talking about who’s going to do what and how they will communicate the results and who will be responsible for the next step and oh fuck who cares you stopped listening two minutes ago

and you close the multiple tickets created by the user who doesn’t understand that CC’ing the helpdesk on every email keeps creating new tickets

and the partner emails a list of 26 possible IP addresses that does not include the two they originally claimed were the only ones they use

and then they try to schedule another meeting anyway and you reject the invite twice

and you go back to bed.

…and reach for earplugs because the neighbor puts his dog out when he goes to work and it barks and whines all day long and sounds remarkably like one of your users.

Retirement Party!


Not mine, sadly, but the ancient NetEngine WebEngine that was dotclue.org for so many years. I pulled it from the co-lo on my way into work this morning, and its reward for fourteen years of faithful service will be a disk scrub and an e-waste bin.

By the way, for all the sometimes-deserved criticism that OpenBSD and its wranglers get, I was still running v3.3 without anyone ever successfully breaking in. I locked it down with a very small set of services, and required non-root logins with ssh keys, and Theo’s Paranoid Army took care of the rest. I applied the various security patches that came out in 2003-2004, but that’s it.

I don’t recommend not updating your server for 14 years, but you can go a lot longer between updates if you start with something designed for security.

Amusingly, I still own the even-older server that hosted munitions.com back in the days when it was shared between folks at WebTV, but I doubt I have anything left that could mount those disks to scrub them, so they’ll just get the sledgehammer treatment, and then go into the e-waste bin.

The Apocralypse


Y2K Apocralypse

(via)

We were all-hands-on-deck for Y2K at WebTV, with Operations, devs, and management all waiting for a scramble signal from QA if something went wrong. Since, like most businesses, we’d fixed everything we could think of well in advance, I was hanging out in a conference room with my 4x5 view camera taking pictures of whisky bottles (and a mildly-cute girl from another group who wandered in at some point; portraits only!).

Turned out there was exactly one thing that had been missed: trying to add a credit card that didn’t start being valid until 1/1/2000. This produced a legendary flaming email from Steve Perlman, which was preserved for posterity because it was a reply-all that CC’d the Remedy ticket system.

Confluence is a pig


Someone got a wild hair to move everything to Confluence (and Jira) at work. Much like the move to Git, this seems to be largely because they’re hiring people who don’t know how to work with anything else and refuse to learn.

It’s a Java app, and the minimum heap size they require is 1 GB. This turns out to be barely enough to handle maybe three users and a tiny handful of pages, because the app has now locked up completely twice due to garbage collection. I doubt we have more than 20 MB of data in the damn thing so far, so this is just really badly written.

“Please stop pretending to be customer support”


So, the ISP who hosts jgreely.com has been sending email since February announcing an upcoming transition to a new platform.

On November 2nd, they sent one that said “we may not get to your domain before our November 28th deadline, so if you don’t want it to be shut off, you might want to run our migration tool yourself and do your own testing.”

On November 15th, they sent a friendly reminder.

On November 16th, they said the migration had been completed successfully, and I should now update my registrar with their new name servers.

Not being an idiot, I queried the new servers, and found: no MX record, no A records, and only one lonely little CNAME pointing ftp.jgreely.com to (nonexistent) www.jgreely.com. The new IP address, available only from their web console, did not listen for SMTP, POP, or IMAP, but a manual connection to port 80 showed that my trivial home page was there. The control panel also showed that my mail config had been modified, but that no data had been copied over from the old server (someone clearly doesn’t understand how IMAP works…).

There is no published support email address. Their online chat never connects. I spent 72 minutes on hold waiting for someone to pick up, and ten minutes explaining the problem to an arrogant moron. I demanded he escalate the call, and he put me on hold for another 20 minutes. I explained the problem again, in detail, and this guy understood, and said they’d regenerate the zone file and it would be fine in a little while.

And, oh-by-the-way, since the transition of my domain was marked complete in their system, the old server could be shut down at any time. But if I noticed it and called, they’d be happy to turn it back on for a little while.

Two hours later, dig still shows no MX, no A, and one pointless CNAME.

Oh, and the “obsolete platform” had shell access; the shiny new one does not. It does, however, have a lot of overpriced add-on services, like “backup/restore” (!), SEO optimization, blahblahblah. And while on hold for over an hour, they kept telling me how paid audio and video services would “keep customers on my site longer”, and other bullshit.

If they don’t get their shit together Real Soon Now, I’ll name and shame them. And, of course, move the account elsewhere. Maybe I’ll just host it on Amazon and run it myself.

[Update: DNS and old email finally showed up. I haven’t switched yet, since it takes a while for name server changes to propagate, and I still don’t really trust these clowns. First, I’m going to backup my mail archives, then switch the IMAP config to point to the old IP address, then create a brand new account that points to the new IP addresses, so I don’t lose days of incoming mail.]

IPv6, begone!


I disabled my Hurricane Electric IPv6 tunnel for now, because it was breaking Netflix and Tumblr. Netflix assumes that the existence of any tunnel means you’re trying to bypass regional content restrictions, and Tumblr because browsers prefer IPv6 when it’s offered, but the tunnel simply can’t handle the massively-parallel image-loading of a typical endlessly-scrolling cheesecake tumblr.

It was an interesting experiment that came in handy when I went to set up a test IPv6 config at work, and I can get native IPv6 at home now just by asking, but there’s really no point. I have a static /29, so until they pry it out of my cold, dead hands, there’s no benefit to me.

Back from Merge


The Perforce conference was entertaining and educational. San Francisco is a hole, but we were pretty much in the hotel all the time, except for a party/“dinner” and a walk down the street to the training facility for those of us who took classes as well.

When I first mentioned the conference, David commented about how one of his clients who tested it had serious performance issues and felt it “really, really disliked binaries being checked in”. This would come as news to the many companies who talked about checking in all their build artifacts, and to the large game companies who check in hundreds of gigabytes of digital assets for all their projects. (or even the last three companies I’ve worked at…) Git and Mercurial have problems with large files and big repositories, but Perforce? Nah.

(not that there aren’t companies who’ll sell you solutions to speed up Perforce, but that means things like “massive parallel checkouts in seconds” and “scaling to petabyte repos”)

My only complaint about the show was the limited amount of vendor swag. They tied everything into a “social” app where you checked in by scanning QR codes and built up points by posting chatty little updates. This was of course gamed to the point that the only actual prize, an Oculus Rift, was won by the person who relentlessly spammed the app with “social” updates. Most of us were there to actually pay attention to presentations and corner the development team, so it wasn’t much of a contest. I figure I’ll have our new sales rep scrounge up some of the leftover swag when she comes out to meet the team.

For me personally, I brought home a lot of good information about how to improve our current server and integrate our wanna-git devs. Their current interim gitlab-to-p4d shim is working for a lot of people, but I have to workaround some issues to use it in our environment (being a little too git-like, it bypasses some of the security features in Perforce, which I can’t allow).

9/10: would kill bad robots again.

Somebody Else’s DDoS


After weeks of occasional mystery outages on our office network, lasting minutes to hours, always ending as mysteriously as they started, this morning I was able to get into the router and get something that looks an awful lot like a smoking gun: connection attempts to port 80 on a single IP address from 725,000+ machines around the world.

The catch? The destination address wasn’t on our network. It belongs to an ISP in Spain.

So, somehow, our ISP’s global routing table decided to forward this attack to us. Given that their response to the previous outages was “gee, looks fine to us”, I’m looking forward to eavesdropping on our network manager’s conversation with them.

packets check in but they don't check out

“Need a clue, take a clue,
 got a clue, leave a clue”