Thursday, September 15, 2011

Xen "blocked for 120 seconds" I/O issues on G7 blades

The issue manifests as crazy load during times of high I/O, with a very high amount of "dirty" memory in /proc/meminfo, and messages like "INFO: task syslogd:1500 blocked for more than 120 seconds" in dmesg that reference fsync further down the stack. It seems to primarily affect HP CCISS disk arrays. I'm mostly concerned about it in RedHat/CentOS, but it may be a bug for users of other distros too. When I wrote about this issue before, I was using G6 HP BL460c Blades - now I'm using G7 blades as well.

RedHat and others claim that you just need to update the driver to the latest from HP, but this emphatically did not work for me. What does seem to work, for both the G6s and G7s, is a band-aid solution that puts a ceiling on I/O throughput and uses the noop scheduler. Additionally, for G7s, the cciss kernel module available from EL Repo seems to help (though the HP driver is still ineffective, as far as I can tell). With a 50MB/sec max in place, xen guests seem to consistently be able to write at about 47-49MB/sec. Without the max in place, xen guest I/O can be as low as 2MB/sec.

As of this moment, the stable configuration for Blade G7s seems to be:

Blade
CentOS 5.7
kernel - 2.6.18-238.12.1.el5xen (Tue May 31 2011! So old!)
with added cciss module (explained below)
using noop for scheduler (explained below)
/proc/sys/dev/raid/speed_limit_max set to 50000

Xen Guest
CentOS 5.7
kernel - 2.6.18-274.el5xen
using noop for scheduler (explained below)
No additional kernel modules, no need to set /proc/sys/dev/raid/speed_limit_max

How to... Add cciss module from EL Repo
Previously, changing the driver never seemed to help anything, but a particular version of the cciss driver that ultimately derives from this project, available at EL Repo, seems to help with the G7 blades.

modinfo cciss  # see what you have now
rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://elrepo.org/elrepo-release-5-3.el5.elrepo.noarch.rpm
yum --enablerepo=elrepo-testing install kmod-cciss-xen.x86_64
modinfo cciss
 # info should have changed

My G6s are currently running without this module added and no I/O issues, so I have no advice one way or the other whether to use this on your G6s.

How to... set /proc/sys/dev/raid/speed_limit_max
To set it temporarily, just until you reboot:
echo "50000" > /proc/sys/dev/raid/speed_limit_max

To set it permanently, taking effect after you reboot, add the line
dev.raid.speed_limit_max = 50000
to /etc/sysctl.conf.

How to... Use noop for the scheduler on both blade and Xen guests
You can temporarily set the I/O scheduler on your machine with:
echo noop > /sys/block/[your-block-device-name]/queue/schedule

Long-term, edit /etc/grub.conf with elevator=noop so that the scheduler is always set on startup:
        title CentOS (2.6.18-274.el5xen)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-274.el5xen ro root=/dev/VolGroup00/LogVol00 elevator=noop console=xvc0
        initrd /initrd-2.6.18-274.el5xen.img

What's the deal with the scheduler? Noop performs fewer transactions per second in exchange for being less of a burden on the system. Wikipedia lovingly calls it "the simplest"I/O scheduler. It's not clear to me if the reason this works is that it has fewer moving parts, as it were, to foul up with the driver, or if it's just slower, so it's working like the ceiling on raid/speed_limit_max.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.