This week I had to replace a drive in a linux based voip system, the drive was a member of a RAID 1 array holding the OS, boot partition, and swap partition. Replacing the drive was pretty simple, just a one for one swap, and then rebuilding the array was a little more involved then what most of us are used to with hardware RAID systems.
Basically I had to manually copy the partition table from the remaining original drive to the new one, and then tell MD (the linux software raid driver) that I wanted to to add the new drive to the array and to mirror the three partitions to the new drive.
The problem is that because this is a software RAID we are going to burn most of our CPU power and Disk I/O to rebuild the array. Because this system handles voice communications we cannot have all those systems resources being used to rebuild the array because that will severely affect voice services.
To counteract these problems there are two variables that we can modify to slow down the rebuild process so that critical voice services are not affected. These variables are:
These variables are located in ‘/proc/sys/dev/raid/’ and do exactly what you might expect. The ‘speed_limit_max’ variable limits the rebuilt rate to a certain number of KBps. And the ‘speed_limit_min’ sets the minimum rebuilding rate in KBps. By default the minimum is set to 1,000 KBps and the maximum is set to 200,000 KBps, which leaves a alot of room for variation.
You could use these variables two different ways, the first would be to issue the following command to turn up the minimum rebuild rate, which would increase the priority of rebuilding and get the drives rebuilt faster:
'echo -n 10000 > /proc/sys/dev/raid/speed_limit_min'
But if you are in my situation you can decrease the speed_limit_max so that the rebuilding priority is forced to slow down and free up resources for the rest of the system to use. You can do this by running the following command:
'echo -n 1000 > /proc/sys/dev/raid/speed_limit_max'
To check to see how fast your arrays are rebuilding you can run:
With these commands I was able to control the rebuild rate and allow for normal system operation to keep running while the drive was rebuilding.
Other Useful commands:
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hdc1 hda1
104320 blocks [2/2] [UU]
md1 : active raid1 hdc2 hda2
1052160 blocks [2/2] [UU]
md0 : active raid1 hda3
244011200 blocks [2/1] [U_]
# mdadm -D /dev/md0
Version : 00.90.01
Creation Time : Mon Sep 26 19:07:32 2011
Raid Level : raid1
Array Size : 244011200 (232.71 GiB 249.87 GB)
Device Size : 244011200 (232.71 GiB 249.87 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Sep 30 22:34:30 2011
State : dirty, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 3 3 0 active sync /dev/hda3
1 0 0 -1 removed
UUID : 0006e09e:df5fc23a:d4c1439f:aaa4ebab
Events : 0.231531
# fdisk /dev/hda
Command (m for help): p
Disk /dev/hda: 251.0 GB, 251058462208 bytes
255 heads, 63 sectors/track, 30522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 fd Linux raid autodetect
/dev/hda2 14 144 1052257+ fd Linux raid autodetect
/dev/hda3 145 30522 244011285 fd Linux raid autodetect
Command (m for help): q
# mdadm /dev/md0 –add /dev/hdc3
mdadm: hot added /dev/hdc3
mdadm /dev/md1 –add /dev/sda3