Playing with RAID

This is the fourth post in the server series. Remember the first post? I installed the server over two RAID arrays, so let’s test if the RAID setup is reliable.

UEFI boot

While installing, I have selected 2 partitions as EFI partitions (/dev/vda1 and /dev/vdb1). The installer only used /dev/vda1 !. So the system won’t be able to boot if /dev/vda1 fails.

This can be temporarily fixed using that command:

sudo dd if=/dev/vda1 of=/dev/vdb1

Temporarily, because this won’t survive a grub package update.

UEFI configuration

Let’s list the current boot options using efibootmgr -v:

BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0002,0000,000A
Boot0000* ubuntu	HD(1,GPT,58a64bdc-78c0-4d4b-b88c-5a71b9770ec9,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0002* EFI DVD/CDROM	PciRoot(0x0)/Pci(0x1,0x1)/Ata(0,0,0)
Boot000A* EFI Internal Shell	MemoryMapped(11,0x900000,0x11fffff)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)

Add the entry for /dev/vdb1

sudo efibootmgr -c -d /dev/vdb -p 1 -L "ubuntu" -l \\EFI\\ubuntu\\shimx64.efi -v
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0001,0002,0000,000A
Boot0000* ubuntu
Boot0002* EFI DVD/CDROM
Boot000A* EFI Internal Shell
Boot0001* ubuntu-boot-backup

Fix the boot order:

efibootmgr -o 0,1,2,a -v
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0000,0001,0002,000A
Boot0000* ubuntu	HD(1,GPT,58a64bdc-78c0-4d4b-b88c-5a71b9770ec9,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0001* ubuntu	HD(1,GPT,d5882241-a0bb-4f74-b70c-696df672785d,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0002* EFI DVD/CDROM	PciRoot(0x0)/Pci(0x1,0x1)/Ata(0,0,0)
Boot000A* EFI Internal Shell	MemoryMapped(11,0x900000,0x11fffff)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)

Verify

Power off the machine, remove /dev/vda, boot, grub should work, complete boot should fail like that:

Gave up waiting for root file system device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/mapper/root-root does not exist.  Dropping to a shell!


BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs) 

Power off, remove /dev/vdb and re-instert /dev/vda, boot, grub should work and the system should fail with the same message.
Power off, try booting with both /dev/vda and /dev/vdb attached, should just work.

Why it is not booting?

It’s not booting because the array has detected the problem, but is read-only at this time, and refuses to let LVM use it.

NB: It seems that ubuntu 18.04, does boot on degraded array, whathever the BOOT_DEGRADED value is, but only if the array has been marked as failed while in read-write mode.

Workaround, force booting the degraded array

When the initramfs shell is shown, array inspection (with mdadm --detail /dev/md0) will show the degraded array, as expected.

The mdadm --readwrite /dev/md0, then reboot.

The real fix, when your array is really down, is to put a new disk, copy the partition table of the remaining disk and add it to the array, like this:

dd if=/dev/used_disk of=/dev/new_disk count=4096
mdadm --manage /dev/mdX --add /dev/new_disk

Wait the sync to finish, then reboot.

Hot plug

Most consumer targeted hardware is not advertised as suppporting hot plug, but in fact, most SATA controllers supports hot plug and hot removal under Linux.

I can’t recommend playing with hot plug/removal with hardware not especifically designed for it since it may break something.

Here I’m playing with a virtual machine, so I won’t risk any physical damage, but I have also tried it on my real machine.

While playing with the following command, I will include what’s is showing in /var/log/syslog using this command:

tail -f /var/log/syslog
...

Fail a disk for real

/dev/md0 contains 2 disk in RAID10, in theory, I can remove 1 disk and the system should continue without problem.

/dev/md1 contains 4 disk in RAID10, in theory, I can remove 1 disk and the system should continue without problem. Choosing wisely, I can even remove a second drive (see the mdadm man page for finding which one).

physically remove the disk /dev/sda from /dev/md0

kernel: [ 5162.651493] md/raid10:md0: Disk failure on vda2, disabling device.
kernel: [ 5162.651493] md/raid10:md0: Operation continuing on 1 devices.
mdadm[713]: sh: 1: /usr/sbin/sendmail: not found

uhoh, I’ve forgotten to setup sendmail, I won’t receive email when an array fails, this is covered here.

the kernel and mdadm are aware of the problem, and the system is still working. mdadm --detail /dev/md0 will the state clean, degraded, with the first disk in the removed state:

mdadm --detail /dev/md0
/dev/md0:
      Version : 1.2
Creation Time : Fri May 18 14:10:27 2018
   Raid Level : raid10
   Array Size : 4066304 (3.88 GiB 4.16 GB)
Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
 Raid Devices : 2
Total Devices : 1
  Persistence : Superblock is persistent

  Update Time : Sat May 19 12:55:45 2018
        State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
Spare Devices : 0

       Layout : near=2
   Chunk Size : 512K

         Name : server-test-setup:0  (local to host server-test-setup)
         UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
       Events : 52

  Number   Major   Minor   RaidDevice State
     -       0        0        0      removed
     1     252       18        1      active sync set-B   /dev/vdb2

Fixing the array

If you just removed the disk just for the joke, the partition table will be OK, and there is no need to rewrite it, but let’s imagine the disk had a real failure and we’re putting brand new disk in place.

put the disk into the machine, the kernel should show it (a physical system will show something on the SATA controller, not on the virtio_pci bus):

kernel: [ 5925.696792] pci 0000:00:07.0: [1af4:1001] type 00 class 0x010000
kernel: [ 5925.697148] pci 0000:00:07.0: reg 0x10: [io  0x0000-0x003f]
kernel: [ 5925.697258] pci 0000:00:07.0: reg 0x14: [mem 0x00000000-0x00000fff]
kernel: [ 5925.699231] pci 0000:00:07.0: BAR 1: assigned [mem 0xc800c000-0xc800cfff]
kernel: [ 5925.699284] pci 0000:00:07.0: BAR 0: assigned [io  0x1000-0x103f]
kernel: [ 5925.699483] virtio-pci 0000:00:07.0: enabling device (0000 -> 0003)
kernel: [ 5925.723509] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
kernel: [ 5925.727548]  vdg: vdg1 vdg2

Since we’re hot-plugging a drive, it is not named /dev/vda anymore, but /dev/vdg in this case.

copy the parition table, and reload it:

sudo dd if=/dev/vdb of=/dev/bdg bs=1M count=1
sudo partprobe
#syslog:kernel: [ 6382.874281]  vdg: vdg1 vdg2

Now add the device back to the array

 mdadm --manage /dev/md0 --add /dev/vdg2
#syslog: kernel: [ 6497.331212] md: recovery of RAID array md0

mdadm --detail /dev/md0 will show the array as clean, degraded, recovering:

 
/dev/md0:
        Version : 1.2
  Creation Time : Fri May 18 14:10:27 2018
     Raid Level : raid10
     Array Size : 4066304 (3.88 GiB 4.16 GB)
  Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat May 19 13:14:09 2018
          State : clean, degraded, recovering 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

         Layout : near=2
     Chunk Size : 512K

 Rebuild Status : 43% complete

           Name : server-test-setup:0  (local to host server-test-setup)
           UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
         Events : 89

    Number   Major   Minor   RaidDevice State
       2     252       98        0      spare rebuilding   /dev/vdg2
       1     252       18        1      active sync set-B   /dev/vdb2

Some time later, syslog will show

 kernel: [ 6517.666308] md: md0: recovery done.

and mdadm --detail /dev/md0 will show the array as clean:

/dev/md0:
     Version : 1.2
  Creation Time : Fri May 18 14:10:27 2018
  Raid Level : raid10
  Array Size : 4066304 (3.88 GiB 4.16 GB)
  Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
Raid Devices : 2
  Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Sat May 19 13:26:43 2018
       State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

      Layout : near=2
  Chunk Size : 512K

        Name : server-test-setup:0  (local to host server-test-setup)
        UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
      Events : 99

 Number   Major   Minor   RaidDevice State
    2     252       98        0      active sync set-A   /dev/vdg2
    1     252       18        1      active sync set-B   /dev/vdb2

The next post in the server is about virtual machines

~~~

Question, remark, bug? Don't hesitate to contact me or report a bug.