Sysadmin

Dear Gitlab,


After some unknown action on your server has silently deleted most repo/wiki directories for a group (~git/git-data/repositories/$group/$project.git), how do I tell it that I have restored the data from my hourly backups?

Currently it shows “The repository for this project does not exist”.

Honestly, it looks like something tried to delete the entire group and aborted 2/3 of the way through.

Update

Ah, the answer is gitlab-rake cache:clear; now, about how they were deleted in the first place…

Dear Xoratmusoqxee,


Given the recent news about large dumps of user-account data from various hacked sites, I downloaded the full list of records for my mail email domain from HaveIBeenPwned, and found nothing new and interesting. Just the adobe, linkedin, kickstarter, and dropbox hacks from several years ago.

Oddly, none of the email addresses used by Honor Hacker and friends in attempts to extort bitcoin show up in their DB, even though one of those was actually a legit closed account (I briefly had a Livejournal account for commenting, with a unique name and strong password, and the “hacker” included the correct password).

The amusing one was that the “Onliner Spambot” collection from 2017 had a confirmed hit for user “xoratmusoqxee” at my domain. That one doesn’t even show up in my spam, despite being at least as plausible as “hand04”, “quinones12”, “bain66”, “Donnell4Stark”, or the ever-popular “ekgknmfylvtl” (seriously, my spam folder gets daily messages directed to that username, all of them in Japanese).

“P4V considered harmful…”


…to my sanity.

Manager set up a Perforce client on his Windows box, then we changed the directory that was set for its root. We could not get p4v to use the new directory. Even deleting the workspace, restarting the client, refreshing the workspaces, and creating a brand new workspace with the same name didn’t work. It still thought the files should be located in the non-existent directory from the earlier incarnation of the client.

We had to use a different client name to avoid this over-aggressive local cache of data it had no business caching in the first place.

Also, to make the process more funtedious, the client-editing window kept spontaneously resizing itself to be slightly taller than the screen, every time we opened it or tried to resize it to fit.

Incompetence or enemy action?


“Embrace the healing power of ‘and’”.

The latest in shutdown theater is expiring SSL certs for government web sites. Either they didn’t bother to order new certs for all the sites they knew were expiring soon, or they deliberately didn’t install them.

Reminder: today’s is the first paycheck that’s delayed for federal employees. Any work they’ve skipped the past few weeks has been by choice.

Fun with dotfiles


When booting OpenBSD 6.3 (at least), the /etc/rc startup script reads /root/.profile. This can produce some rather entertaining boot failures, including things like syslogd timing out on startup, preventing you from getting any log data about what might be wrong…

I’m quite certain this wasn’t the case in earlier releases, but I’m not sure when it crept in.

# Simple confirmation:
echo sleep 60 >> /root/.profile
reboot
# It will take an extra ~8 minutes to boot

It looks like they try to work around this by setting HOME=/ in /etc/rc, and having a separate /.profile, but it doesn’t work; it still reads /root/.profile.

Ah, there it is! /etc/rc.d/rc.subr:

...
rc_start() {
        ${rcexec} "${daemon} ${daemon_flags}"
}

...

[ -z "${daemon_user}" ] && daemon_user=root

...

rcexec="su -l -c ${daemon_class} -s /bin/sh ${daemon_user} -c"

So, anything executed from a proper start/stop rc script gets executed in a fresh su -l session (even if it’s running as root), and that resets $HOME.

The machine I was upgrading pre-dates the rc.d scripts, so it didn’t have the problem.

More fun than one man can handle…


Sometime this morning, someone rebooted a KVM server. We don’t know who, yet, but this would only have been a minor problem if it weren’t for the fact that another unknown someone had accidentally deleted the disk images for some of the VMs running on that server.

A month ago. No one noticed because they kept running on the unlinked open files…

Daily full backups are your friend!

Update

NetworkManager: threat or menace?

Seriously, who even configures a server with it, and who came up with the idea of instantly taking down the interfaces the moment you save ifcfg-eth0 to switch from NM to static config? Fortunately, IPMI meant that I didn’t have to physically plug a monitor into the server to get back in.

(although Neal did have to plug in the IPMI interface for me; the perils of setting up new servers in a hurry…)

Synology DS918+ quick notes


Just bought a new NAS for home, and decided on a Synology DS918+ with 4 10TB drives ($539 + 4 x $310). Why not another ReadyNAS? A combination of price and vague dissatisfaction with the ones I’ve used in the past; I may write that up sometime.

Why not FreeNAS? Because I didn’t feel like building one from scratch right now (as much as I like the idea of a ZFS-based NAS), and the prebuilt unit we once bought from iXsystems ended up going back due to being a piece of junk. Both Synology and ReadyNAS use BTRFS as their filesystem format these days, which offers a lot of what you get with ZFS without the need to occasionally resort to command-line incantations. (“Not That There’s Anything Wrong With That!”)

Drive installation was painless (simple snap-in hot-swap trays), and while I found the “desktop” web GUI a bit overdone, everything works well. The biggest annoyance was figuring out which of the “private cloud” packages to add, because they recently changed all that, resulting in some confusion. (short version: only install the Drive package and desktop/mobile clients, and open TCP ports 5000, 5001, and 6690; also use the builtin LetsEncrypt support and set everything to require SSL)

The “EZ-Internet” cloud/firewall config was useless; it’s just a UPnP wrapper, and when it realized that it couldn’t auto-configure my OpenBSD router, the only help it offered was “hey, you should open some ports”, with no indication of which ones were actually required for the installed packages (see above).

Side note: I was amused and pleased that Drive, their latest, greatest personal cloud solution, required installing the Perl package. 😜

I went with their ‘hybrid’ RAID config, SHR-1, because it resizes better when you add more drives or swap in larger drives. This gives me 26 TB in usable space (9.1 * 3 - overhead), which is plenty for now. Down the road, if when media, disk images, and automated backups start to fill that up, I’ll add the DX517 expansion chassis and another 5 10TB drives and bring it up to 52 TB usable.

If you’re following along at home, you may wonder why adding 5 drives doesn’t give closer to 70 TB, and the answer is paranoia. SHR-1 uses a single parity drive, which means you can only afford to lose one disk. This is generally not a huge problem if you have a spare on-hand and swap it in immediately, but there’s a non-trivial risk that another drive will fail while the first one is rebuilding.

If you think about it, this is even more likely when you buy all your RAID disks at once from the same manufacturing batch, so you really want two parity disks and a hot spare, so that the system can start rebuilding as soon as one disk fails, and can survive losing another one during the rebuild. Having only one data disk in a four-disk chassis isn’t terribly useful, so for now I’m running in a cheaper, less-paranoid configuration. When I’m sure that I like the Synology enough to really rely on it, I’ll buy the expansion and convert the RAID to SHR-2 with a hot spare. And buy a cold spare disk as well.

Additional performance enhancements I can add include bonding the two 1-gigabit ports together, bumping the memory (official max 8GB, but there are reports that 16GB works), and adding SSD cache drives. That last is specifically why I chose the 918+, since it has a pair of M.2 slots on the bottom, and some of their other models require you to buy an expansion card first.

Building the volume was quick, but it took ~16.25 hours to run the initial parity consistency check, so performance was sub-optimal until that finished. The GUI was occasionally a bit sluggish during that time.

Next up: setting up dedicated Time Machine volumes for the Macs and testing their Windows backup client.

Oh, and I named it Index.

Update

First Time Machine backup complete. Just because I was curious how well it would work, I backed up 425 GB over wireless, which took about 7.5 hours.

Tips for the day


When the HVAC people insist that the maintenance can be done with zero impact on the running server room, don’t believe them.

When they say no one will need to be onsite, don’t believe them.

When they say they’ll bring in adequate portable cooling to keep the temperature down, don’t believe them.

When they say that the additional portable cooler they’re rushing over should be enough, don’t believe them.

When they say that the even bigger unit that’s on the way should finally be enough, don’t believe them.

Surprisingly, the switches and NASes that passed out from heatstroke all recovered.

“Need a clue, take a clue,
 got a clue, leave a clue”