Intro
Recently I was involved in moving a large website from Slicehost to an ASP (Application Service Provider) closer to the client. Everything went well, until DNS was updated. Load on the server was high all the time and would increase to a point where web pages wouldn’t be served any more.
Comparison
Slicehost and the ASP provided essentially the same server setup, as per our request:
|
Slicehost |
ASP |
| # CPUs |
4 |
4 |
| Ram |
3 gig |
3 gig |
| OS |
Ubuntu Lucid |
Debian 6.0 |
| VM |
Xen |
VMWare ESX |
The website contains over 50 gig of images with sizes up to 1.5 meg. All the pages are dynamically created with PHP connected to mySQL. There is a lightly used Ruby app also running via mod_passenger. On an hourly basis a php script is run to fetch an XML file which is then parsed and imported into the database. None of this is very complicated, and there are numerous points of optimization that could be done.
Statistics
After the move, the server load would range between 6 and 25. Once the server hit 15, the websites would become start to become unavailable. But ‘top’ was only showing around 20% overall CPU usage!
So I started toying with the standard stat programs:
# mpstat 1 20
Linux 2.6.32-5-686 (pmm) 07/13/2011 _i686_ (4 CPU)
01:21:45 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
01:21:46 PM all 0.27 0.00 0.53 72.27 0.00 0.00 0.00 0.00 26.93
01:21:47 PM all 0.73 0.00 0.49 72.37 0.00 0.00 0.00 0.00 26.41
01:21:48 PM all 4.49 0.00 0.47 31.68 0.00 0.00 0.00 0.00 63.36
01:21:49 PM all 8.15 0.00 0.99 42.22 0.00 0.25 0.00 0.00 48.40
01:21:50 PM all 11.86 0.00 0.77 41.49 0.00 0.26 0.00 0.00 45.62
01:21:51 PM all 7.07 0.00 0.25 38.38 0.00 0.00 0.00 0.00 54.29
01:21:52 PM all 6.93 0.00 0.80 51.20 0.00 0.00 0.00 0.00 41.07
01:21:53 PM all 6.83 0.00 0.68 35.54 0.00 0.00 0.00 0.00 56.95
Average: all 8.94 0.00 0.57 41.86 0.01 0.04 0.00 0.00 48.57
#w
13:22:05 up 48 days, 12:43, 2 users, load average: 12.00, 11.50, 10.10
CPU %usage from mpstat matched what top was indicating, but the %iowait was very high. To me, this means the processors are waiting for data to work on.
# iostat (edited)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 25.00 0.00 10.00 0.00 280.00 28.00 0.04 4.40 1.20 1.20
sda 0.00 0.00 1.00 0.00 8.00 0.00 8.00 0.00 4.00 4.00 0.40
sda 0.00 20.00 3.00 9.00 72.00 232.00 25.33 0.06 5.33 2.00 2.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 1.00 0.00 8.00 0.00 8.00 0.01 8.00 8.00 0.80
--- import starts ---
sda 1.00 1216.00 11.00 135.00 984.00 10808.00 80.77 20.06 137.37 2.93 42.80
sda 0.00 137.00 65.00 973.00 1168.00 8880.00 9.68 0.95 0.92 0.76 79.20
sda 0.00 148.00 81.00 1414.00 704.00 12496.00 8.83 0.93 0.62 0.55 82.00
sda 1.00 165.00 42.00 1468.00 432.00 13064.00 8.94 1.04 0.69 0.55 83.20
sda 0.00 160.00 24.00 1125.00 192.00 13136.00 11.60 0.84 0.73 0.69 78.80
sda 0.00 148.00 56.00 1571.00 448.00 13752.00 8.73 0.71 0.44 0.44 70.80
sda 0.00 172.00 43.00 1327.00 376.00 11992.00 9.03 0.79 0.58 0.56 76.80
sda 0.00 142.00 55.00 1306.00 464.00 11584.00 8.85 0.82 0.61 0.53 72.00
Wow! There appears to be a lot of disk I/O going on here. What it really means, I’m not sure.
# iotop (recreated since I don't have a historical version)
Total DISK READ: 267.11 K/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
20876 be/4 mysql 0.00 B/s 15.71 K/s 0.00 % 68.00 % mysqld --basedir=/usr --da~ld/mysqld.sock --port=3306
28051 be/4 www-data 129.63 K/s 0.00 B/s 0.00 % 40.44 % apache2 -k start
28061 be/4 www-data 137.48 K/s 0.00 B/s 0.00 % 31.12 % apache2 -k start
28059 be/4 www-data 0.00 B/s 3.93 K/s 0.00 % 23.00 % apache2 -k start
Once again, lots of I/O going on. It is interesting that apache requires so much I/O for reading. I expected mySQL to require much more I/O since it is reading AND writing to the database.
# apache2ctl status (edited)
Apache Server Status for localhost
--------------------------------------------------------------------------
CPU Usage: u319.69 s97.34 cu.05 cs0 - 4.88% CPU load
36 requests currently being processed, 14 idle workers
_WWC_W_W_WWWW_WLW_CWCW_W_WW______WC.............................
................................................................
................................................................
................................................................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Apache doesn’t think it is using much load, but there are a lot of open connections.
Watching ‘top’ over time, I noted that the import process was consuming 99% of one of the 4 cpus. The other cpus were generally 98% idle, but apache requests would occasionally use between 5-12% of each cpu. I knew the import acted this way beforehand, and it wasn’t a problem on the Slicehost server. So that wasn’t the root of the problem.
Actions Taken
Apache
I turned ‘KeepAlive off’ thinking each ‘client browser’ was hogging an apache process, and eventually all the available forked processes would become blocked. My thinking was that the high I/O was being caused because the images couldn’t be read from disk and passed out fast enough. So open apache processes were constantly waiting for data to be sent.
I set a specific number of apache processes to run according to the available memory left over from mySQL. This has the effect that once memory is assigned to the apache forked process it doesn’t go away. I also set the number of Ruby threads to a very low number to also help control memory usage.
These changes gave a little breathing room when load increased above 15, but the problem was by no means solved.
mySQL
I tweaked mySQL to better utilize all the remaining memory, thinking the import process was making mySQL work harder than the default settings could handle. I switched the db engine from myisam to innodb. That appeared to make the webpages appear faster, but the import was taking 20% longer!
I ran some benchmark tests that I found on the document titled Virtualization for MySQL on VMware
This document uses sysbench to run a test using mySQL and a temporary database. I didn’t delve into the details, but wanted to see a rudimentary comparison of the engines and how the test would affect server load.
An example command-line usage:
# ./sysbench
--num-threads=4
--max-time=900
--max-requests=50000
--test=oltp
--mysql-user=root
–mysql-host=localhost
--mysql-port=3306
--mysql-table-engine=innodb
--oltp-test-mode=complex –oltp-table-size=8000000 run >
MYRESULTS.txt
I tested myisam and innodb. The test results were interesting by themselves. When I ran the tests, each of which took 5 minutes, the load was around 2-4, but the load did NOT increase while the tests were running!
Even though the tests indicated that innodb is faster by almost a factor of 2, it was slowing down the import process, so we switched back to myisam.
System
Someone pointed out that a 32-bit OS was installed. Ugh, not much can be done about that, and I don’t believe it would cause problems of this proportion.
Then someone found the following Article that suggests changing the IO Scheduler from cfq to noop. Huh?
It turns out to be very easy. All you have to do is run the following command:
# echo noop > /sys/block/sda/queue/scheduler
And… load has stabilized and been where we expect it to be.
To make this persist after a reboot in Debian, just modify the ‘GRUB_CMDLINE_LINUX’ line in /etc/default/grub.cfg to be:
GRUB_CMDLINE_LINUX="elevator=noop"
Then run:
# update-grub
To make this persist after a reboot with RedHat/CentOS just add “elevator=noop” to the end of your kernel line in /boot/grub/menu.lst:
title CentOS (2.6.18-238.12.1.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-238.12.1.el5 dom0_mem=256M elevator=noop
module /vmlinuz-2.6.18-238.12.1.el5xen ro root=/dev/VolGroup00/LogVol00
module /initrd-2.6.18-238.12.1.el5xen.img
Resolution
It turns out the OS has a scheduler that manages low level I/O operations. The default scheduler, Completely Fair Queuing (CFQ), tries to distribute the available I/O bandwidth equally among all I/O requests. This is great if all those requests are being sent to one piece of hardware in the local machine that handles all I/O. In our case this server is using a Network Appliance connected via FiberChannel, which should be fast enough to handle anything we throw at it. I didn’t find this out until late into the troubleshooting phase.
The NOOP scheduler assumes whatever data is sent to the I/O device will be handled by that device in the most efficient manner. Most of the links I’ve found reference NOOP with SSD drives, but it also makes complete sense once you factor in our particular setup, because the Network Appliance is much faster than any local drive could be.
Conclusion
The statistics pointed to an I/O issue, and that was in fact the root of the problem. As for knowing about such a setting, I can only say that is something that you just learn from experience. In the troubleshooting process, we also found other things that needed be tweaked and pointed out things to consider as this site grows.