Wednesday, August 8, 2012

Large disks in CentOS

I hardly ever have to deal with truly large RAIDs or disks in a typical server - that's what NFS and Isilon and suchlike are for, amirite? So here's how you do it.

Dealing with a large boot disk/RAID during install

If you've got a true RAID, you should be able to create a small (e.g. 10GB or whatever you judge is appropriate) RAID just for the system disk, and another for the data. Then you can specify that linux will be installed on the small disk, and the installer won't choke. Hooray!

Note that if you're dealing with HP products, the "ORCA", the raid utility that is available during boot, is not sophisticated enough to carve out a small RAID in this way. You will need to download and burn to CD or bootable USB key the HP Offline Array Configuration Utility (the "ACU"). If you're going the USB key route, note that it has to be burned using the HP USB Key utility, which for some reason will not burn this particular utility from an image, but only from a CD (or image mounted as a CD). Have fun!

Or, you may have a single 3TB hard drive, or a couple 3TB hard drives that you'd like to spread out across in your refurbished desktop tower. I don't have great notes on this and can't go into much detail - apologies. I can tell you that this is much less of a pain with the CentOS 6 installer than with the CentOS 5 installer. With CentOS 5, you want to use a utility like parted to manually create a boot partition and a system disk partition. Any large data partions should be labelled "gpt". (I will go into a few more details about how to use parted in the next section.) Note that the installer will refuse to use more than 2TB of the boot disk, no matter what. Oh well. With CentOS 6, you have similar restrictions, but you don't have to prelabel the disks.

Adding large disks/RAIDs post-install

Run parted on the new raid, like so:

parted /dev/sdc

... where sdc is the new device that you haven't seen before. If you are baffled about what the new device is called, check dmesg and/or just ls /dev and look for new, unused devices. Next, make the label:


(parted) mklabel gpt                                                    (parted) print                                                          
Model: HP LOGICAL VOLUME (scsi)
Disk /dev/sdc: 33.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start  End  Size  File system  Name  Flags


And now make the first partition. With disks up to a certain size, the trick to avoiding the dreaded "Warning: the resulting partition is not properly aligned for best performance" is to use "1" for the start and "-1" for the end. However, with more massive disks, this is apparently insufficient due to the size of the metadata needed on the disk. You can laboriously calculate by hand how big 34 sectors will be for your size disk, call it X, and set the start to X and the end to -X. Or, you can make parted do that for you by using "0%" and "100%", and then running align-check:


(parted) mkpart primary 1 -1                                            
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? cancel                                                     
(parted) mkpart                                                           
Partition name?  []? primary                                            
File system type?  [ext2]? ext4                                           
Start? 0%                                                                 
End? 100%                                                                 
(parted) align-check opt 1                                              
(parted) print                                                          
Model: HP LOGICAL VOLUME (scsi)
Disk /dev/sdc: 30.0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  30.0TB  30.0TB               primary

(parted)


Next you'll expect to run mkfs, right? However:


# mkfs.ext4 /dev/sdc1
mke2fs 1.41.12 (17-May-2010)
mkfs.ext4: Size of device /dev/sdc1 too big to be expressed in 32 bits
using a blocksize of 4096.
# mkfs.ext4 -b 8192 /dev/sdc1
Warning: blocksize 8192 not usable on most systems.
mke2fs 1.41.12 (17-May-2010)
mkfs.ext4: 8192-byte blocks too big for system (max 4096)
Proceed anyway? (y,n) n


Ooops! Even though theoretically, ext4 on a 64-bit system should be able to support more than 16TB (the 32-bit system limit), there is no stable version of ext4 that does. (yet - and there are hacks available) Also, CentOS, at least, does not support larger block sizes. So, you will need to use XFS instead. If you're on CentOS 5 and don't have mkfs.xfs available, set enabled=1 in the [centosplus] section of /etc/yum.repos.d/CentOS-Base.repo, then yum install kmod-xfs xfsprogs.  In case you missed it, if you're still running a 32-bit system for some horrible reason, do not try to have a single filesystem greater than 16TB. It may get created without errors, but you will start experiencing data loss after you fill up the first 16TB, since silent errors that result in data loss are awesome.

# mkfs.xfs /dev/sdc1
meta-data=/dev/sdc1              isize=256    agcount=32, agsize=228924472 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=7325583104, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
#

You can now mount your new filesystem!

# mkdir /bigdisk
# mount /dev/sdc1 /bigdisk
df -h /bigdisk
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdc1              28T  34M   28T   1% /bigdisk

And add the appropriate line to /etc/fstab:

/dev/sdc1      /bigdisk       xfs     defaults   0 0

Hooray!