Sysadmin

Fun with dotfiles


When booting OpenBSD 6.3 (at least), the /etc/rc startup script reads /root/.profile. This can produce some rather entertaining boot failures, including things like syslogd timing out on startup, preventing you from getting any log data about what might be wrong…

I’m quite certain this wasn’t the case in earlier releases, but I’m not sure when it crept in.

# Simple confirmation:
echo sleep 60 >> /root/.profile
reboot
# It will take an extra ~8 minutes to boot

It looks like they try to work around this by setting HOME=/ in /etc/rc, and having a separate /.profile, but it doesn’t work; it still reads /root/.profile.

Ah, there it is! /etc/rc.d/rc.subr:

...
rc_start() {
        ${rcexec} "${daemon} ${daemon_flags}"
}

...

[ -z "${daemon_user}" ] && daemon_user=root

...

rcexec="su -l -c ${daemon_class} -s /bin/sh ${daemon_user} -c"

So, anything executed from a proper start/stop rc script gets executed in a fresh su -l session (even if it’s running as root), and that resets $HOME.

The machine I was upgrading pre-dates the rc.d scripts, so it didn’t have the problem.

More fun than one man can handle…


Sometime this morning, someone rebooted a KVM server. We don’t know who, yet, but this would only have been a minor problem if it weren’t for the fact that another unknown someone had accidentally deleted the disk images for some of the VMs running on that server.

A month ago. No one noticed because they kept running on the unlinked open files…

Daily full backups are your friend!

Update

NetworkManager: threat or menace?

Seriously, who even configures a server with it, and who came up with the idea of instantly taking down the interfaces the moment you save ifcfg-eth0 to switch from NM to static config? Fortunately, IPMI meant that I didn’t have to physically plug a monitor into the server to get back in.

(although Neal did have to plug in the IPMI interface for me; the perils of setting up new servers in a hurry…)

Synology DS918+ quick notes


Just bought a new NAS for home, and decided on a Synology DS918+ with 4 10TB drives ($539 + 4 x $310). Why not another ReadyNAS? A combination of price and vague dissatisfaction with the ones I’ve used in the past; I may write that up sometime.

Why not FreeNAS? Because I didn’t feel like building one from scratch right now (as much as I like the idea of a ZFS-based NAS), and the prebuilt unit we once bought from iXsystems ended up going back due to being a piece of junk. Both Synology and ReadyNAS use BTRFS as their filesystem format these days, which offers a lot of what you get with ZFS without the need to occasionally resort to command-line incantations. (“Not That There’s Anything Wrong With That!”)

Drive installation was painless (simple snap-in hot-swap trays), and while I found the “desktop” web GUI a bit overdone, everything works well. The biggest annoyance was figuring out which of the “private cloud” packages to add, because they recently changed all that, resulting in some confusion. (short version: only install the Drive package and desktop/mobile clients, and open TCP ports 5000, 5001, and 6690; also use the builtin LetsEncrypt support and set everything to require SSL)

The “EZ-Internet” cloud/firewall config was useless; it’s just a UPnP wrapper, and when it realized that it couldn’t auto-configure my OpenBSD router, the only help it offered was “hey, you should open some ports”, with no indication of which ones were actually required for the installed packages (see above).

Side note: I was amused and pleased that Drive, their latest, greatest personal cloud solution, required installing the Perl package. 😜

I went with their ‘hybrid’ RAID config, SHR-1, because it resizes better when you add more drives or swap in larger drives. This gives me 26 TB in usable space (9.1 * 3 - overhead), which is plenty for now. Down the road, if when media, disk images, and automated backups start to fill that up, I’ll add the DX517 expansion chassis and another 5 10TB drives and bring it up to 52 TB usable.

If you’re following along at home, you may wonder why adding 5 drives doesn’t give closer to 70 TB, and the answer is paranoia. SHR-1 uses a single parity drive, which means you can only afford to lose one disk. This is generally not a huge problem if you have a spare on-hand and swap it in immediately, but there’s a non-trivial risk that another drive will fail while the first one is rebuilding.

If you think about it, this is even more likely when you buy all your RAID disks at once from the same manufacturing batch, so you really want two parity disks and a hot spare, so that the system can start rebuilding as soon as one disk fails, and can survive losing another one during the rebuild. Having only one data disk in a four-disk chassis isn’t terribly useful, so for now I’m running in a cheaper, less-paranoid configuration. When I’m sure that I like the Synology enough to really rely on it, I’ll buy the expansion and convert the RAID to SHR-2 with a hot spare. And buy a cold spare disk as well.

Additional performance enhancements I can add include bonding the two 1-gigabit ports together, bumping the memory (official max 8GB, but there are reports that 16GB works), and adding SSD cache drives. That last is specifically why I chose the 918+, since it has a pair of M.2 slots on the bottom, and some of their other models require you to buy an expansion card first.

Building the volume was quick, but it took ~16.25 hours to run the initial parity consistency check, so performance was sub-optimal until that finished. The GUI was occasionally a bit sluggish during that time.

Next up: setting up dedicated Time Machine volumes for the Macs and testing their Windows backup client.

Oh, and I named it Index.

Update

First Time Machine backup complete. Just because I was curious how well it would work, I backed up 425 GB over wireless, which took about 7.5 hours.

Tips for the day


When the HVAC people insist that the maintenance can be done with zero impact on the running server room, don’t believe them.

When they say no one will need to be onsite, don’t believe them.

When they say they’ll bring in adequate portable cooling to keep the temperature down, don’t believe them.

When they say that the additional portable cooler they’re rushing over should be enough, don’t believe them.

When they say that the even bigger unit that’s on the way should finally be enough, don’t believe them.

Surprisingly, the switches and NASes that passed out from heatstroke all recovered.

Moving Tales


So, we’re in the new building. Well, not the me part of “we”, yet; we’re still down the street from the old place for another month, until that lease is up, allowing them to use our space for storage and staging and such this week. It’ll shave about 15 minutes off my commute when I do get moved there, though, so that’s nice.

We kicked off the move early Thursday morning, powering down the data center and grabbing some essential servers and gear that we wanted back online as soon as they swung over the Cogent line, leaving the rest for the professional server movers (for the first time, this was Not Our Problem).

Anyone in the Bay Area may recall that it started pouring down rain in the wee hours Thursday, the first real rain of the season. Those of us who were still a bit groggy as we finished the server shutdown were suddenly WIDE AWAKE when the fire alarm went off.

…because the rain was coming into the electrical closet through a conduit, right onto the fire control panel. Smaller quantities were also coming into the server room, including a small amount right into the rack where all of the Really Important Servers we were about to hand-carry were located. Fortunately, we got everything out intact.

To our immense surprise, we could plausibly claim to be fully functional this morning when people showed up. They couldn’t all unpack their offices and cubes because things were still being moved and built, but that was also Not Our Problem this time.

Pro tip: when you have to be out of your old building by date X, get the keys to the new one no later than X - 90. Not X - 20ish.

Kyocera printer drivers in El Capitan


So, if you’re trying to add a shiny new office color laser printer (such as the two Kyocera TASKalfa 5052ci that were delivered to our new building), and you’re running Mac OS X El Capitan, and you get a spinning beachball of doom no matter what protocol you try to connect with, here’s what’s going on and how to fix it.

Let’s say you try to use the LPD protocol. As you type each character of the host name, Apple looks it up in DNS and tries to connect via SNMP to figure out what it is. When you click “Add”, it then uses IPP to query for device options.

This is where it goes to hell. The Mac posts a request using HTTP, and the Kyocera says “that shit’s insecure, call me back on HTTPS”. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. Repeat. killall AddPrinter

The same thing happens if you try to use IPP directly, or JetDirect, or pretty much any protocol. Works fine on Sierra or High Sierra, blows chunks on El Capitan.

The only fix is to log into the printer and completely disable SSL. Note that it is not sufficient to simply shut off SSL; you must also disable the “Secure Only” feature for every protocol (and probably login to the printer again, since you’ll be killing the HTTPS page that you’re currently logged in through), or it will keep redirecting you to pages that it knows perfectly well don’t exist.

Not a big fan of shutting off SSL, but redirect-to-broken-SSL is worse.

’nuff said


(original via)

Lesson 22: Blessed Silence


It’s amazing how much less random root and postmaster email I get when 2,000+ servers are down (deliberately, that is; I get a lot more email when they’re down accidentally…).

(they really needed a few more takes for Kate Mulgrew on this scene to make her hand gesture less artificial, but Joel Grey is so perfect that I’m willing to forgive them)

Saturday Update

Doing 85 MPH on the highway with Twinkle Trick blasting on repeat is how I bring over switches from the old building.

“Need a clue, take a clue,
 got a clue, leave a clue”