How to re-add a dropped disk from a degraded MDADM array

Posted on: 7 February, 2016     Posted by: Kevin    Tags: , , ,

We had an issue earlier today where a customer’s Ubutnu Server decided to stop working. It simply refused to boot into the OS, instead displaying the following:

[3.958693] md/raid:md0: raid level 5 active with 4 out of 5 devices, algorithm 2

BusyBox v1.18.5 (Ubuntu 1:1.18.5-1ubuntu4.1) built-in shell (ash)
Enter ‘help’ for a list of built-in commands.

(initramfs)

Note: In the event where you must load the OS you can press CTRL + D to proceed with booting the OS.

The first line, obviously indicated a degraded RAID array. Essentially stating that there were only 4 out of 5 disks active, for one reason or another the fifth had decided to drop from the array.

We confirmed this by issuing the following two commands:

Mmdadm --detail /dev/md0

This command display a bunch of useful information on the array. In our situation we were interested in the RAID devices / Total Devices values. As well as the small tables at the end of the output which provided details on each disk.

cat /proc/mdstat

This provides slightly less information, however is very useful for identifying a degraded array and clearly showed a dropped disk.

The dropped disk however, was still installed in the machine and still appeared to be showing up fine:

ls -l /dev/sd*

To provide a speedy remediation we attempted to simply re-add the disk to the array:

mdadm --manage /dev/md0 --add /dev/sde1

The worked fine and the drive was re-added to the array, with rebuilding/synchronization commencing immediately. We can confirm this and monitor the rebuild by issuing:

mdadm --detail /dev/md0

Within the output there is a Rebuild Status item which provides a % on the rebuild status.

We prefer to monitor the rebuild using the following:

cat /proc/mdstat

This provides more detailed information on the rebuild, most notably the remaining time for the rebuild to complete – in our case it will take about 1754 minutes (29hrs) to rebuild the array.

Once the rebuild completed, everything reverted back to normal; the OS booted and operated fine and the array once again showed all 5 disks.