Playing with RAID

Playing with RAID

ubuntu 18.04 hw2018 server raid-lvm

This is the fourth post in the server series. Remember the first post? I installed the server over two RAID arrays, so let’s test if the RAID setup is reliable.

UEFI boot

While installing, I have selected 2 partitions as EFI partitions (/dev/vda1 and /dev/vdb1). The installer only used /dev/vda1 !. So the system won’t be able to boot if /dev/vda1 fails.

This can be temporarily fixed using that command:

sudo dd if=/dev/vda1 of=/dev/vdb1

Temporarily, because this won’t survive a grub package update.

UEFI configuration

Let’s list the current boot options using efibootmgr -v:

BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0002,0000,000A
Boot0000* ubuntu	HD(1,GPT,58a64bdc-78c0-4d4b-b88c-5a71b9770ec9,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0002* EFI DVD/CDROM	PciRoot(0x0)/Pci(0x1,0x1)/Ata(0,0,0)
Boot000A* EFI Internal Shell	MemoryMapped(11,0x900000,0x11fffff)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)

Add the entry for /dev/vdb1

sudo efibootmgr -c -d /dev/vdb -p 1 -L "ubuntu" -l \\EFI\\ubuntu\\shimx64.efi -v
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0001,0002,0000,000A
Boot0000* ubuntu
Boot0002* EFI DVD/CDROM
Boot000A* EFI Internal Shell
Boot0001* ubuntu-boot-backup

Fix the boot order:

efibootmgr -o 0,1,2,a -v
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0000,0001,0002,000A
Boot0000* ubuntu	HD(1,GPT,58a64bdc-78c0-4d4b-b88c-5a71b9770ec9,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0001* ubuntu	HD(1,GPT,d5882241-a0bb-4f74-b70c-696df672785d,0x800,0x3c800)/File(\EFI\ubuntu\shimx64.efi)
Boot0002* EFI DVD/CDROM	PciRoot(0x0)/Pci(0x1,0x1)/Ata(0,0,0)
Boot000A* EFI Internal Shell	MemoryMapped(11,0x900000,0x11fffff)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)

Verify

  • Power off the machine, remove /dev/vda, boot, grub should work, complete boot should fail like that:
Gave up waiting for root file system device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/mapper/root-root does not exist.  Dropping to a shell!


BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs) 
  • Power off, remove /dev/vdb and re-instert /dev/vda, boot, grub should work and the system should fail with the same message.

  • Power off, try booting with both /dev/vda and /dev/vdb attached, should just work.

Why it is not booting?

It’s not booting because the array has detected the problem, but is read-only at this time, and refuses to let LVM use it.

NB: It seems that ubuntu 18.04, does boot on degraded array, whathever the BOOT_DEGRADED value is, but only if the array has been marked as failed while in read-write mode.

Workaround, force booting the degraded array

When the initramfs shell is shown, array inspection (with mdadm --detail /dev/md0) will show the degraded array, as expected.

The mdadm --readwrite /dev/md0, then reboot.

The real fix, when your array is really down, is to put a new disk, copy the partition table of the remaining disk and add it to the array, like this:

dd if=/dev/used_disk of=/dev/new_disk count=4096
mdadm --manage /dev/mdX --add /dev/new_disk

Wait the sync to finish, then reboot.

Hot plug

Most consumer targeted hardware is not advertised as suppporting hot plug, but in fact, most SATA controllers supports hot plug and hot removal under Linux.

I can’t recommend playing with hot plug/removal with hardware not especifically designed for it since it may break something.

Here I’m playing with a virtual machine, so I won’t risk any physical damage, but I have also tried it on my real machine.

While playing with the following command, I will include what’s is showing in /var/log/syslog using this command:

tail -f /var/log/syslog
...

Fail a disk for real

/dev/md0 contains 2 disk in RAID10, in theory, I can remove 1 disk and the system should continue without problem.

/dev/md1 contains 4 disk in RAID10, in theory, I can remove 1 disk and the system should continue without problem. Choosing wisely, I can even remove a second drive (see the mdadm man page for finding which one).

  • physically remove the disk /dev/sda from /dev/md0
    kernel: [ 5162.651493] md/raid10:md0: Disk failure on vda2, disabling device.
    kernel: [ 5162.651493] md/raid10:md0: Operation continuing on 1 devices.
    mdadm[713]: sh: 1: /usr/sbin/sendmail: not found
    
  • uhoh, I’ve forgotten to setup sendmail, I won’t receive email when an array fails, this is covered here.

  • the kernel and mdadm are aware of the problem, and the system is still working. mdadm --detail /dev/md0 will the state clean, degraded, with the first disk in the removed state:
    mdadm --detail /dev/md0
    /dev/md0:
          Version : 1.2
    Creation Time : Fri May 18 14:10:27 2018
       Raid Level : raid10
       Array Size : 4066304 (3.88 GiB 4.16 GB)
    Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
     Raid Devices : 2
    Total Devices : 1
      Persistence : Superblock is persistent
    
      Update Time : Sat May 19 12:55:45 2018
            State : clean, degraded 
     Active Devices : 1
    Working Devices : 1
     Failed Devices : 0
    Spare Devices : 0
    
           Layout : near=2
       Chunk Size : 512K
    
             Name : server-test-setup:0  (local to host server-test-setup)
             UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
           Events : 52
    
      Number   Major   Minor   RaidDevice State
         -       0        0        0      removed
         1     252       18        1      active sync set-B   /dev/vdb2
    

Fixing the array

If you just removed the disk just for the joke, the partition table will be OK, and there is no need to rewrite it, but let’s imagine the disk had a real failure and we’re putting brand new disk in place.

  • put the disk into the machine, the kernel should show it (a physical system will show something on the SATA controller, not on the virtio_pci bus):
    kernel: [ 5925.696792] pci 0000:00:07.0: [1af4:1001] type 00 class 0x010000
    kernel: [ 5925.697148] pci 0000:00:07.0: reg 0x10: [io  0x0000-0x003f]
    kernel: [ 5925.697258] pci 0000:00:07.0: reg 0x14: [mem 0x00000000-0x00000fff]
    kernel: [ 5925.699231] pci 0000:00:07.0: BAR 1: assigned [mem 0xc800c000-0xc800cfff]
    kernel: [ 5925.699284] pci 0000:00:07.0: BAR 0: assigned [io  0x1000-0x103f]
    kernel: [ 5925.699483] virtio-pci 0000:00:07.0: enabling device (0000 -> 0003)
    kernel: [ 5925.723509] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
    kernel: [ 5925.727548]  vdg: vdg1 vdg2
    
  • Since we’re hot-plugging a drive, it is not named /dev/vda anymore, but /dev/vdg in this case.
  • copy the parition table, and reload it:
    sudo dd if=/dev/vdb of=/dev/bdg bs=1M count=1
    sudo partprobe
    #syslog:kernel: [ 6382.874281]  vdg: vdg1 vdg2
    
  • Now add the device back to the array
     mdadm --manage /dev/md0 --add /dev/vdg2
    #syslog: kernel: [ 6497.331212] md: recovery of RAID array md0
    
  • mdadm --detail /dev/md0 will show the array as clean, degraded, recovering:
 
/dev/md0:
        Version : 1.2
  Creation Time : Fri May 18 14:10:27 2018
     Raid Level : raid10
     Array Size : 4066304 (3.88 GiB 4.16 GB)
  Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat May 19 13:14:09 2018
          State : clean, degraded, recovering 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

         Layout : near=2
     Chunk Size : 512K

 Rebuild Status : 43% complete

           Name : server-test-setup:0  (local to host server-test-setup)
           UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
         Events : 89

    Number   Major   Minor   RaidDevice State
       2     252       98        0      spare rebuilding   /dev/vdg2
       1     252       18        1      active sync set-B   /dev/vdb2
  • Some time later, syslog will show
     kernel: [ 6517.666308] md: md0: recovery done.
    
  • and mdadm --detail /dev/md0 will show the array as clean:
    /dev/md0:
         Version : 1.2
      Creation Time : Fri May 18 14:10:27 2018
      Raid Level : raid10
      Array Size : 4066304 (3.88 GiB 4.16 GB)
      Used Dev Size : 4066304 (3.88 GiB 4.16 GB)
    Raid Devices : 2
      Total Devices : 2
     Persistence : Superblock is persistent
    
     Update Time : Sat May 19 13:26:43 2018
           State : clean 
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
          Layout : near=2
      Chunk Size : 512K
    
            Name : server-test-setup:0  (local to host server-test-setup)
            UUID : 2e89326f:5563e3a6:8c6b552b:e62c4dbf
          Events : 99
    
     Number   Major   Minor   RaidDevice State
        2     252       98        0      active sync set-A   /dev/vdg2
        1     252       18        1      active sync set-B   /dev/vdb2
    

The next post in the server is about virtual machines

~~~

Question, remark, bug? Don't hesitate to contact me or report a bug.