In the previous tutorial on LVM we went through the steps on installing, configuring and managing the addition and removal of PV's. This article will move forward with the data protection options provided by LVM.
Thinking about protection and disk drives, the first thing that comes to mind is RAID (Redundant Array of Inexpensive Disks). This technology provides redundancy by combining multiple disk drives and distributing data across. There are different ways of data distribution called "RAID levels" (1 to 6), as RAID 1 and RAID 5 being the most common.
In RAID 1 two or more drives keep mirrored copies of the data. No matter how many disk drives fail, if there is at least one operational drive, the whole information is accessible. This availability comes at a price of disk capacity.
With RAID 5 the data is distributed over three disk drives at least. This configuration uses block-level striping with parity data distributed across all member disks. If any of the drives fails, the data is still available (the parity data is used to recover the missing information). The advantage of using RAID 5 is that the total capacity is not reduced by half (as with RAID 1). It is reduced only by the capacity of a single drive. For example if you have four 1 TB disk drives you can configure two RAID 1 arrays, each with a capacity of 1 TB. This makes a total of 2 TB of available disk capacity. If you configure a RAID 5 array instead, you will loose only 1 TB of capacity totaling in 3 TB of total storage being available.
This increased capacity, however, comes at a price of reduced availability. RAID 5 configurations can tolerate only a single drive failure. If two or more disk drives become unavailable, the entire array fails.
To illustrate the usage of RAID within LVM we will change the LVM configuration that we built in the previous tutorial, adding RAID 1 protection to it.
We will put /dev/sdb1 and /dev/sdc1 in RAID 1 (mirroring without parity or striping) and mount it under /dev/md0. In a similar way we will combine /dev/sdd1 and /dev/sde1 and configure them to be accessible under /dev/md1.
To build this configuration we will first evict all data from /dev/sdc1 and /dev/sde1. After that we will remove the /dev/sdc1 and /dev/sde1 partitions from the volume group. We will next create /dev/md0, but we will make it use a single disk for the moment (/dev/sdc1). We will create /dev/md1 in a similar fashion (using only /dev/sde1). We will then move away all data from /dev/sdb1 and /dev/sde1. After these two partitions are freed we will add them as mirrored disks to /dev/md0 and /dev/md1. This will let us build our RAID configurations step by step, without losing any data during the process.
Let's start by freeing /dev/sdc1. We use the pvmove command to evict its data.
[root@el5 ~]# pvmove /dev/sdc1 /dev/sdc1: Moved: 26.5% /dev/sdc1: Moved: 52.2% /dev/sdc1: Moved: 77.8% /dev/sdc1: Moved: 92.7% /dev/sdc1: Moved: 100.0% [root@el5 ~]#
Removing /dev/sdc1 from the database group is done via the vgreduce command.
[root@el5 ~]# vgreduce database /dev/sdc1 Removed "/dev/sdc1" from volume group "database" [root@el5 ~]#
Next we delete the physical volume by using pvremove.
[root@el5 ~]# pvremove /dev/sdc1 Labels on physical volume "/dev/sdc1" successfully wiped [root@el5 ~]#
Our next step should be to change the partition type for /dev/sdc1. Its type should be "fd" (Linux raid autodetect), if it is going to be used in a RAID configuration.
[root@el5 ~]# fdisk /dev/sdc The number of cylinders for this disk is set to 6527. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]#
Building the RAID array is done via the mdadm command. The name is derived from Multiple Device (MD) and it is the main Linux command for managing RAID configurations.
[root@el5 ~]# mdadm --create /dev/md0 --auto=yes -l 1 -n 2 /dev/sdc1 missing mdadm: array /dev/md0 started. [root@el5 ~]#
After /dev/md0 is in place, we can build a physical volume on top of it.
[root@el5 ~]# pvcreate /dev/md0 Physical volume "/dev/md0" successfully created [root@el5 ~]#
After the PV is created we can extend the database group using vgextend.
[root@el5 ~]# vgextend database /dev/md0 Volume group "database" successfully extended [root@el5 ~]#
Our RAID 1 configuration is using a single disk for the time being and it does not provide any level of protection. Let's move the data out of /dev/sdb1 and add it as a second device for /dev/md0.
Evicting the data is done by running the pvmove command.
[root@el5 ~]# pvmove /dev/sdb1 /dev/sdb1: Moved: 7.3% /dev/sdb1: Moved: 19.1% /dev/sdb1: Moved: 30.9% /dev/sdb1: Moved: 42.7% /dev/sdb1: Moved: 54.5% /dev/sdb1: Moved: 66.3% /dev/sdb1: Moved: 78.1% /dev/sdb1: Moved: 89.9% /dev/sdb1: Moved: 100.0% [root@el5 ~]#
Before changing the partition's type we have to take it out of the database group and remove the PV that is built on top of it.
[root@el5 ~]# vgreduce database /dev/sdb1 Removed "/dev/sdb1" from volume group "database" [root@el5 ~]# pvremove /dev/sdb1 Labels on physical volume "/dev/sdb1" successfully wiped [root@el5 ~]#
We can now safely change the partition type to "fd".
[root@el5 ~]# fdisk /dev/sdb The number of cylinders for this disk is set to 6527. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]#
Adding /dev/sdb1 to /dev/md0 is done by running mdadm.
[root@el5 ~]# mdadm --manage /dev/md0 --add /dev/sdb1 mdadm: added /dev/sdb1 [root@el5 ~]#
Executing the mdam command starts a synchronization process. Its task is to mirror the data from /dev/sdc1 to /dev/sdb1. We can check the process's progress by looking at /proc/mdstat.
[root@el5 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[2] sdc1[0]
52428032 blocks [2/1] [U_]
[>....................] recovery = 3.0% (1600000/52428032) finish=3.7min speed=228571K/sec
unused devices:
[root@el5 ~]
The synchronization process mirrored about 3% of the data so far. When it completes, the output from /proc/mdstat will look like this.
[root@el5 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sdc1[0]
52428032 blocks [2/2] [UU]
unused devices:
[root@el5 ~]#
Our first RAID 1 configuration /dev/md0 is now operational. Let's move on with building the second one - /dev/md1.
We start by evicting the data from /dev/sde1
[root@el5 ~]# pvmove /dev/sde1 /dev/sde1: M oved: 6.0% /dev/sde1: Moved: 17.8% /dev/sde1: Moved: 29.6% /dev/sde1: Moved: 41.4% /dev/sde1: Moved: 53.2% /dev/sde1: Moved: 65.0% /dev/sde1: Moved: 76.8% /dev/sde1: Moved: 88.6% /dev/sde1: Moved: 100.0% [root@el5 ~]#
Next we take /dev/sde1 out of the group and remove the PV.
[root@el5 ~]# vgreduce database /dev/sde1 Removed "/dev/sde1" from volume group "database" [root@el5 ~]# pvremove /dev/sde1 Labels on physical volume "/dev/sde1" successfully wiped [root@el5 ~]#
Changing the partition type to "fd" is done via fdisk.
[root@el5 ~]# fdisk /dev/sde The number of cylinders for this disk is set to 6527. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]#
As with /dev/md0 we build the /dev/md1 RAID configuration with /dev/sde1 and a missing second disk.
[root@el5 ~]# mdadm --create /dev/md1 --auto=yes -l 1 -n 2 /dev/sde1 missing mdadm: array /dev/md1 started. [root@el5 ~]#
We create a PV on top of /dev/md1 and extend the database group with it.
[root@el5 ~]# pvcreate /dev/md1 Physical volume "/dev/md1" successfully created [root@el5 ~]# vgextend database /dev/md1 Volume group "database" successfully extended [root@el5 ~]#
Let's inspect the PV's, just to be sure that everything is correct.
[root@el5 ~]# pvdisplay --- Physical volume --- PV Name /dev/sdd1 VG Name database PV Size 50.00 GB / not usable 3.31 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 5119 Allocated PE 7680 PV UUID RF37f3-SfBR-aozU-7l01-Rqh9-4YV2-1o7GdK --- Physical volume --- PV Name /dev/md0 VG Name database PV Size 50.00 GB / not usable 3.25 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 7679 Allocated PE 5120 PV UUID PAEniI-bCth-w430-3gsV-ToeS-JARL-wsJ2g6 --- Physical volume --- PV Name /dev/md1 VG Name database PV Size 50.00 GB / not usable 3.25 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 12799 Allocated PE 0 PV UUID I9Hrbo-ehjq-vn9b-PgNa-PhAY-57tj-j8f8Vx [root@el5 ~]#
Our final tasks is to move the data out of /dev/sdd1 and add it as a second disk for /dev/md1.
[root@el5 ~]# pvmove /dev/sdd1 /dev/sde1: M oved: 6.0% /dev/sde1: Moved: 17.8% /dev/sde1: Moved: 29.6% /dev/sde1: Moved: 41.4% /dev/sde1: Moved: 53.2% /dev/sde1: Moved: 65.0% /dev/sde1: Moved: 76.8% /dev/sde1: Moved: 88.6% /dev/sde1: Moved: 100.0% [root@el5 ~]# vgreduce database /dev/sdd1 Removed "/dev/sdd1" from volume group "database" [root@el5 ~]# pvremove /dev/sdd1 Labels on physical volume "/dev/sdd1" successfully wiped [root@el5 ~]#
We change the partition type to Linux raid autodetect (fd).
[root@el5 ~]# fdisk /dev/sdd The number of cylinders for this disk is set to 6527. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]#
Adding /dev/sdd1 to /dev/md1 is done via mdadm.
[root@el5 ~]# mdadm --manage /dev/md1 --add /dev/sdd1 mdadm: added /dev/sdd1
This starts the mirroring process that we already saw for /dev/md0.
[root@el5 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[2] sde1[0]
52428032 blocks [2/1] [U_]
[=>...................] recovery = 5.3% (2800000/52428032) finish=3.8min speed=215384K/sec
md0 : active raid1 sdb1[1] sdc1[0]
52428032 blocks [2/2] [UU]
unused devices:
[root@el5 ~]#
When the process completes we will notice that /proc/mdstat shows two RAID 1 groups (md0 and md1).
[root@el5 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[1] sde1[0]
52428032 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sdc1[0]
52428032 blocks [2/2] [UU]
unused devices:
[root@el5 ~]#
Looking at the information about our physical volumes we can notice that from LVM's perspective there are only two PV's - /dev/md0 and /dev/md1.
[root@el5 ~]# pvdisplay --- Physical volume --- PV Name /dev/md0 VG Name database PV Size 50.00 GB / not usable 3.25 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 7679 Allocated PE 5120 PV UUID PAEniI-bCth-w430-3gsV-ToeS-JARL-wsJ2g6 --- Physical volume --- PV Name /dev/md1 VG Name database PV Size 50.00 GB / not usable 3.25 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 5119 Allocated PE 7680 PV UUID I9Hrbo-ehjq-vn9b-PgNa-PhAY-57tj-j8f8Vx [root@el5 ~]#
Let's take a look at the database group as well.
[root@el5 ~]# vgdisplay --- Volume group --- VG Name database System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 31 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 2 Act PV 2 VG Size 99.99 GB PE Size 4.00 MB Total PE 25598 Alloc PE / Size 12800 / 50.00 GB Free PE / Size 12798 / 49.99 GB VG UUID DcuO3X-HFRV-ZVyx-MVlD-gY3c-jyo0-FwR3uO [root@el5 ~]#
The group consists of two PV's and its capacity is reduced to 100 GB. This is the price we have to pay for having mirrored copies of each of the volumes.
You probably noticed that all changes that we did for configuring the RAID groups were accomplished with mounted devices. This is one of the key advantages provided by LVM - the ability to manipulate physical disks without disrupting the system's availability.
Increasing the RAID's capacity
The available free space always depletes, sooner or later. There are two ways for increasing the capacity for a RAID configuration like the one we built. The first approach is to add more physical disks of the same size. The md0 group for instance consists of two drives, 50 GB each, and provides a total of 50 GB of disk space. If add two more 50 GB disks, we will increase the group's capacity to 100 GB (4x50 GB divided by two, because of the mirroring).
Another approach would be to take out a single disk from md0 and md1 and replace them with disks, having bigger capacity. We can then use these new disks and built a third, bigger RAID 1 group (md2). We can then move all data from md0 and md1 to the newly created md2 group. After no data is left in the old RAID 1 groups, we can destroy them, replace the remaining two disks with bigger ones and built another RAID 1 group. At the end we will have two RAID 1 groups again, but they will have much bigger capacity, as they will use bigger disk drives.
We are already familiar with the process of adding new disks for increasing the capacity, so there is nothing new for us in the first approach. It will be more interesting to see how we can implement the second approach, changing the current drives with bigger ones.
Let's start by inspecting the PV's that are in use (we have two RAID 1 groups at the moment).
[root@el5 ~]# pvdisplay --- Physical volume --- PV Name /dev/md0 VG Name database PV Size 50.00 GB / not usable 3.06 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 7679 Allocated PE 5120 PV UUID PAEniI-bCth-w430-3gsV-ToeS-JARL-wsJ2g6 --- Physical volume --- PV Name /dev/md1 VG Name database PV Size 50.00 GB / not usable 3.06 MB Allocatable yes PE Size (KByte) 4096 Total PE 12799 Free PE 5119 Allocated PE 7680 PV UUID I9Hrbo-ehjq-vn9b-PgNa-PhAY-57tj-j8f8Vx [root@el5 ~]#
Let's remove the second and the fourth disks of the machine. We have to mark /dev/sdb1 and /dev/sdd1 as failed and remove them from their RAID groups (md0 and md1).
[root@el5 ~]# mdadm --manage /dev/md0 --fail /dev/sdb1 mdadm: set /dev/sdb1 faulty in /dev/md0 [root@el5 ~]# mdadm --manage /dev/md0 --remove /dev/sdb1 mdadm: hot removed /dev/sdb1 [root@el5 ~]# mdadm --manage /dev/md1 --fail /dev/sdd1 mdadm: set /dev/sdd1 faulty in /dev/md1 [root@el5 ~]# mdadm --manage /dev/md1 --remove /dev/sdd1 mdadm: hot removed /dev/sdd1 [root@el5 ~]#
We turn of the virtual machine, remove the disks and replace them with two new drives, 200 GB each. After the change is completed, the VM's configuration looks like this:
Our first task after the system comes up is to build "fd" partitions on the two new disks (/dev/sdb and /dev/sdd).
[root@el5 ~]# fdisk /dev/sdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 26108. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-26108, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-26108, default 26108): Using default value 26108 Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]# fdisk /dev/sdd Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 26108. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-26108, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-26108, default 26108): Using default value 26108 Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@el5 ~]#
Next, we use mdadm to build the third RAID 1 group (md2). This time there will be no missing disks in the initial setup.
[root@el5 ~]# mdadm --create /dev/md2 --auto=yes -l 1 -n 2 /dev/sdb1 /dev/sdd1 mdadm: array /dev/md2 started. [root@el5 ~]#
Looking at /proc/mdstat will show us the three RAID groups.
[root@el5 ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdd1[1] sdb1[0]
209712384 blocks [2/2] [UU]
md1 : active raid1 sde1[0]
52428032 blocks [2/1] [U_]
md0 : active raid1 sdc1[0]
52428032 blocks [2/1] [U_]
unused devices:
[root@el5 ~]#
Moving the data from md0 to md2 is done by running pvmove.
[root@el5 ~]# pvmove /dev/md0 /dev/md2 /dev/md0: Moved: 10.3% /dev/md0: Moved: 20.6% /dev/md0: Moved: 30.9% /dev/md0: Moved: 41.5% /dev/md0: Moved: 51.9% /dev/md0: Moved: 61.7% /dev/md0: Moved: 72.2% /dev/md0: Moved: 82.7% /dev/md0: Moved: 93.1% /dev/md0: Moved: 100.0% [root@el5 ~]#
We do the same for md1.
[root@el5 ~]# pvmove /dev/md1 /dev/md2 /dev/md1: Moved: 3.5% /dev/md1: Moved: 21.0% /dev/md1: Moved: 35.0% /dev/md1: Moved: 45.5% /dev/md1: Moved: 59.4% /dev/md1: Moved: 73.3% /dev/md1: Moved: 87.2% /dev/md1: Moved: 100.0% [root@el5 ~]#
After md0 and md1 are freed, we can safely remove them from the database group. This is done by running vgreduce.
[root@el5 ~]# vgreduce database /dev/md0 Removed "/dev/md0" from volume group "database" [root@el5 ~]# vgreduce database /dev/md1 Removed "/dev/md1" from volume group "database" [root@el5 ~]#
Let's remove the PV's as well.
[root@el5 ~]# pvremove /dev/md0 Labels on physical volume "/dev/md0" successfully wiped [root@el5 ~]# pvremove /dev/md1 Labels on physical volume "/dev/md1" successfully wiped [root@el5 ~]#
If we run pvdisplay, we will see that the only available physical volume is the newly created /dev/md2 and that its capacity totals to 200 GB.
[root@el5 ~]# pvdisplay --- Physical volume --- PV Name /dev/md2 VG Name database PV Size 200.00 GB / not usable 1.25 MB Allocatable yes PE Size (KByte) 4096 Total PE 51199 Free PE 38399 Allocated PE 12800 PV UUID hmTRBJ-gjMJ-Dqnm-QhFL-6187-Zhby-vHL9ko [root@el5 ~]#
This is also the only physical volume used by the database group.
[root@el5 ~]# vgdisplay --- Volume group --- VG Name database System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 44 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 200.00 GB PE Size 4.00 MB Total PE 51199 Alloc PE / Size 12800 / 50.00 GB Free PE / Size 38399 / 150.00 GB VG UUID DcuO3X-HFRV-ZVyx-MVlD-gY3c-jyo0-FwR3uO [root@el5 ~]#
We just have to remove /dev/md0 and /dev/md1. After that we should replace the two remaining disks with 200 GB ones. We can then build a new RAID 1 group and extend database with it. Performing these steps should already be trivial for you, so let's finish this tutorial here.



