I’ve just had my most pleasant move from a smaller to a larger disk ever thanks to LVM, with vastly reduced downtime, and I thought I’d share my happy experience.
LVM is Linux’s “Logical Volume Manager”. You take a bunch of physical disks and partitions – the “physical volumes” – and bundle them together to make a “volume group”. Then you split up the “volume group” into partitions – or “logical volumes” again in whatever way you like. It works by breaking the disk up into 4MB chunks called “extents”, and maintaining a map between the logical and physical indexes of extents. The advantage is that you don’t have to care about the physical layout when you create the logical layout – you can stick together several physical disks into one big virtual disk, for example, or grow whatever logical volume you like without worrying about what might be using the disk sectors in front of it. You can also do things like move a logical volume from one disk to another while it’s still in use, which is the very exciting subject of this post.
I’ve been setting up LVM on every machine I’ve installed Linux on for a while now, on the assumption that this sort of flexibility would come in handy one day, but I’ve never actually used it until now. At install time, I set up a 100MB physical partition at the start of the disk for the “/boot” directory – I’ll talk about this later when I discuss GRUB – then dedicate the rest of the disk to LVM. You can do all this in the Ubuntu installer. When I needed to extend my disk space, here’s what I did:
- switched my machine off (not sure if I have SATA hot-swap, and thought it best not to find out the hard way)
- fitted the new disk
- brought it back up; the new disk arrived as /dev/sdb.
- Applied the same partitioning to the new disk – /dev/sdb1 is 100MB and /dev/sdb5 is the rest of the disk.
- “pvcreate /dev/sdb5″ makes (most of) the disk into a “physical partition” that LVM can use
- “vgextend lvmvolume /dev/sdb5″ adds the new physical partition to the volume group. Now my volume group is three times bigger!
- I needed to make use of some of that extra space right away to get some work done, so next came “lvextend -L +50G /dev/lvmvolume/root” to give my root partition an extra 50 gig.
- Now the partition is bigger than the filesystem, so I tell the filesystem to make use of the extra space: “resize2fs /dev/lvmvolume/root”. “resize2fs” is happy to make the filesystem bigger even while it’s mounted and in use, but you have to unmount it to make it smaller, so I tend to resize upwards in small increments as the extra space is needed.
- And that’s it, I had the space I needed. I started to use it right away.
- But, I want to take the old disk out otherwise they will start to pile up, so there’s more to do. To tell LVM to stop using the old disk, I issued “pvmove /dev/sda1″. Over the course of several hours, this copied my root filesystem – which, in case I hadn’t mentioned this enough, was mounted and in use – from the old disk to the new.
- I was disturbed to see that pvmove finshed with an error; I’m afraid I didn’t keep a copy of the error. However, issuing it a second time said there was no work to do, and “pvdisplay” showed that /dev/sda1 was no longer in use.
- “vgreduce -a” removed /dev/sda1 from the list of physical volumes in the volume group, and
- “pvremove” marked it as no longer a physical volume.
These days I call my volume group “vg” rather than “lvmvolume”. I suspect that the system that needs more than one volume group is rare, so a short name is handy.
That’s the LVM part of things moved. However, there’s still “/boot” to take care of. To explain what’s going on there, I need to talk a little about how a modern Linux system boots; I got this from this mailing list post and discussions on the #grub IRC channel.
If the BIOS decides that it’s going to boot from a hard drive, it loads a very small program from the very first sector of the disk (known as the MBR). That program then loads the next things it needs “core.img” which is in sectors 1 to 62 (which are always unused). It finds two things in there:
- GRUB modules – these tell it how to do things like read partition tables, or read Linux filesystems
- basic config information – which tells it where “/boot” may be found
In other words “core.img” contains exactly enough for GRUB to find and read the filesystem with “/boot” in it. From there, it can read any other modules it may need. It can also find the “grub.cfg” configuration file that tells it what to do next. “grub.cfg” will usually instruct it to display a menu; when you select a system to boot, “grub.cfg” will tell it where to find the kernel image it’s going to boot. It loads that image into memory and runs it, with parameters from “grub.cfg” that tell it where to look for the root filesystem it’s eventually going to use.
What I’ve described here is GRUB 2, and as it happens GRUB 2 is happy to read /boot from an LVM-based filesystem, given the right modules. However, the old GRUB couldn’t do that, and since Ubuntu still prefers to use the old GRUB at install time and upgrade later, I prefer to put it on a separate filesystem. Possibly I could have changed that while moving to the new disk, but I didn’t want to change too much at once, and it only takes up 0.1% of the disk.
So, after this transition I’ve got most of my filesystem onto the new disk, but the boot process still relies on the old disk – only the old disk has the MBR, or core.img, or the /boot partition. Copying over /boot is easy – “mkfs.ext3 /dev/sdb1″ to create it, then mount it and copy over the files, after which I unmounted both, changed “/etc/fstab” to point at the new drive (to avoid problems when devices move about, /etc/fstab names the partition by UUID – you can get the UUID of all your partitions with “blkid”) and re-mounted it now using the new one.
After this, I went wrong. I had hoped that “grub-install –recheck /dev/sdb” would be enough to get the new drive booting OK, but it turns out that this doesn’t re-write “grub.cfg”, and that file still contained references to an old UUID. I’m not sure what the Right Way to fix this file was; I just used “grub-mkconfig > /boot/grub/grub.cfg” to rewrite it, but I suspect there’s some higher-level tool I should have called.
That’s it – with that done, I could remove the drive from the machine. However, I plugged it in one last time – in order to zero it out. Not so much for security, though this is also a consideration, since though there is some theoretical possibility of recovering data from such a drive, in practice zeroing will make recovering any data vastly more expensive. But the main reason for zeroing the drive is to unambiguously mark the drive as not in use, so that neither I nor anyone else hesitates to use the drive for another purpose. Otherwise, it’s easy to acquire stacks of “limbo” drives which have been replaced, but which you’re keeping because you can’t quite remember if there’s anything on them you might want. Zeroing the drive out eliminates that concern.
This could have been easier – the LVM stage was as straightforward as I could ask, but the GRUB side of things was harder, at least in part thanks to a lack of documentation. Still, I’m now happily using my new disk, and I was very happy not to have to leave the machine out of use while it copied half a gigabyte of data from the old drive to the new.