After years of filling in as Acting Network Guy (now ending, thankfully), I have decided that there’s really only one thing I’m certain of: IPSec problems are always at the other end.
This was demonstrated yet again this morning when we were trying to change our end of a tunnel that had been up for several years from a /32 to a /24, so that additional machines could route through the tunnel. On my end (OpenBSD), this was a one-line change in ipsec.conf and a one-line change in pf.conf. On their end, which involved Real Networking Hardware, it was days of fumbling that left the old /32 tunnel up while they insisted they’d switched their config.
It took a 45-minute conference call this morning to get it straightened out, which I basically spent watching anime with the sound off while their tech guy cleaned cruft out of his configs and rebuilt their end from scratch.
[unrelated, my co-lo had a power outage, and my ancient beta WebEngine never auto-boots completely; you have to hit the big red button on the front. Sadly, the folks at the co-lo had no success with the big red button, so I had to scrounge around the house for the custom console adapter this thing uses, and stop by on my way to work today to watch it fsck the disks. They’ve had several outages this year, and I think it’s time to move the server to one of the statics on my Comcast Business line and then upgrade it to something more powerful than a 500MHz Pentium 3 with 256MB of RAM.]
Please do not forge scary emails from HR and accounting on April Fools Day. Nobody’s laughing.
With the number of servers that have caught fire or things that have needed sudden extra attention at work, I ask, in the words of Lyra Lackwit:
"Will things please stop happening now?"
“auditors” and “Perl script”
Outlook 2013 started breaking for our users last week. Only some of them, and not all at the same time, but the symptom was that the application would no longer start, hanging at the “loading profile…” screen.
The solution is to switch to the “Windows 7 Basic” graphical theme, turning off all the 3D UI decorations.
And that’s about four days of sysadmin time that we’d like back, please.
That’s the number of emails sent out this morning by a test service that was getting pummeled by an automated QA script.
[Update: after many eyes explored the logs, the QA test script was found to have done exactly the right thing, and the bug was in the actual service. So, a big huzzah for catching a truly crippling bug before it reached Production, but damn that was a mess.]
We bought a Dell R620 to run VMware ESXi 5.1U1. It was pre-configured to correctly boot the supplied ESXi image from an SD card. Bringing it up on the network was trivial. Downloading the Windows vSphere Client software was trivial. Configuring a datastore so that you could actually use the product was annoying.
Y’see, they shipped it with a Windows GPT partition table, and attempting to use the disk produced a lengthy timeout and disconnect, every time. Occasionally, I’d get a pop-up error message, but couldn’t select it to cut and paste, and enabling ssh on the server showed that no errors were being logged.
Typing the error message in by hand (“… HostDatastoreSystem.QueryVmfsDatastoreCreateOptions … failed”) and googling it turned up detailed solutions for the problem, with obsolete commands. So, for the benefit of anyone else who gets into this state on ESXi 5.1:
Now you can use it as a datastore.
“No, we just moved our office, we didn’t change anything except the external IP address. The VPN problem must be on your end. Did you set the new IP address?”.
“Okay, we did install a new NAT router. But the problem must be on your end. Did you set the new IP address?”
“Oh, yes, it’s running a newer version of the OS. But the problem must be on your end. Did you set the new IP address?”
“Here are screenshots of our config. But the problem must be on your end. Did you set the new IP address?”
“Yes, we set it up with IKEv2 instead of v1. But the problem must be on your end. Did you set the new IP address?”
It’s actually been more than eight hours, and they still haven’t fixed their problem, but I at least got some sleep in the middle. We’d still be arguing about what the problem actually is if they hadn’t sent me the screenshots.
Oh, and it was urgent for me to make the change on my end Friday night (which they told me about on Friday afternoon…), but no one at their end actually checked their router for connectivity until this morning. And it’s been nearly an hour since they responded to the message that they’re using the wrong IKE version, but they still haven’t fixed it.
[Update: to add insult to injury, I just got a recruiting email from WalmartLabs. Perhaps the fact that it’s raining in Northern California in late June should have been a clue that the week was going to be a little odd.]