"Raid, raid, go away..."

Mon 9/7/15 11:53am

7 Comments

Sysadmin

When a disk fails in a RAID array, the primary risk associated with replacing it is that another disk will fail before the replacement is fully populated. At which point you’ve lost all your data.

So you can understand my concern yesterday morning when, as I was walking into the computer store to buy a replacement SSD for a machine that had failed unexpectedly, I got email from a NAS reporting a failed RAID5 disk, and discovered that I had two servers to fix.

The good news is that the RAID array finished rebuilding successfully while I was rebuilding the server that needed the SSD replaced.

The bad news is that as soon as I finished the long drive home, I got email that the brand-new disk I’d just installed failed. Crib death is possible, but this time the GUI wasn’t responding reliably either, and a root shell on the NAS got hung when I ran dmesg. Which means it was the 5-year-old NAS itself failing, and the disks were probably fine. If I could get them swapped into an identical chassis. That part will have to wait until Tuesday, since while I could buy something today, Amazon Marketplace can’t get me a ReadyNAS Pro 6 on Labor Day.

I’d be more upset if the NFS mounts weren’t still working, allowing me to copy most of the data off to random free space elsewhere. I haven’t quite come up with 8.3TB yet, but a lot of that is archived logs that may have to wait.

Oh, and the original, unrelated SSD replacement? I’m still babysitting that one, too, since the system involved is a fairly gross hack, held together with twist-ties and bubblegum.

My holiday weekend is going just rosy, thanks. How’s yours?