Who/what is doing all this file I/O ? (aka stupid userspace)
So I had a little spare time the other day and decided to do some experimentation with SystemTAP again. Previously I’ve looked at which processes were doing at boottime; this needs updating and re-visiting to see how things have changed since FC5, however, I decided to leave that task for later. Instead I decided to look at what’s going on in my ‘idle’ GNOME desktop session over a 5 minute period. The SystemTap script I wrote for bootprobes was trivially adapted to serve this new task. Most of the work was in writing a new perl routine to post-process / analyse the data.
The systemtap script simply output a line every time a process closed a file detailing how many bytes it read & how many it wrote. A very short sample is shown below, the first field is milliseconds since start of the script, then the pid, then command line, then filename & bytes read & written:
60 2224 (pcscd) access /dev/bus/usb/005/001 read: 0 write: 0 60 2224 (pcscd) access /dev/bus/usb/005/006 read: 0 write: 0 112 2588 (hald-addon-stor) access /dev/hda read: 0 write: 0 120 1888 (ipw3945d) access /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd read: 3008 write: 0 223 1888 (ipw3945d) access /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd read: 3008 write: 0 324 1888 (ipw3945d) access /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd read: 3008 write: 0 428 2837 (python) access /proc/xen/privcmd read: 0 write: 0 428 1888 (ipw3945d) access /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd read: 3008 write: 0 428 2831 (xenstored) access /var/lib/xenstored/tdb.0x9e1e9b8 read: 0 write: 73728
Processing this I correlated parent-child proesses to see who’s spawning who. More importantly for each unique (pid, filename) pair the I tracked how many times the file was opened, the time of each open & bytes read/written. For files opened more than once, a crude frequency of opening was calculated & average I/O size. This produces output like:
Pid 2060 irqbalance run 292374+ ms... reads /proc/interrupts 30 times, once every 9667 ms avg 8192 bytes writes /proc/irq/12/smp_affinity once only avg 1 bytes writes /proc/irq/23/smp_affinity 10 times, once every 27002 ms avg 1 bytes
Hall of Shame
Anyway, so looking through the output a few examples really stand out for awards in the hall of shame
1. pcscd – smart card daeon
Top of the hall of shame is pcscd a daemon for dealing with smart card readers. Once per second, it scans all USB devices on the system checking to see if any of them are newly plugged in smartcard readers. For the 99.99999% of people who will never see a smartcard reader this is top kwalitee crack. Perhaps someone might like to help pcscd learn to love HAL which is perfectly capable of notifying it when a device is plugged in…
Pid 2224 pcscd run 301599+ ms... reads /dev/bus/usb/001/001 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/002/001 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/003/001 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/004/001 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/004/002 301 times, once every 1000 ms avg 57 bytes reads /dev/bus/usb/005/001 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/005/002 301 times, once every 1000 ms avg 43 bytes reads /dev/bus/usb/005/005 301 times, once every 1000 ms avg 52 bytes reads /dev/bus/usb/005/006 301 times, once every 1000 ms avg 52 bytes reads /dev/bus/usb/005/008 301 times, once every 1000 ms avg 50 bytes
2. ipw3945d – binary regulatory daemon for ipw3945 wifi
Next up in the hall of shame is Intel’s closed source, binary regulatory daemon for the ipw3945 series of wifi cards. Reading data from a sysfs node corresponding to the wifi PCI device, on average every 100 ms
Pid 1888 ipw3945d run 301538+ ms... reads /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd 2954 times, once every 102 ms avg 3008 bytes writes /sys/bus/pci/drivers/ipw3945/0000:03:00.0/cmd 9 times, once every 29654 ms avg 20.3333333333333 bytes
This activity doesn’t appear to be at all related to level of network I/O, pretty much a constant rate of access regardless of what the system is doing. Please, pretty please, just move this stuff back into the kernel module or firmware & kill this stupid daemon.
3. multiload-applet – GNOME system monitor applet
It is perhaps a little unfair to put this in the hall of shame because I did make an explicit choice to turn this daemon on, and its entire raison d’etre is to sample system state every ‘n’ seconds. The reason its here, however, is because its default out-of-the-box sampling frequency is a mere 500 ms and because of the sheer number of files it feels the need to poll:
Pid 3290 multiload-apple run 301191+ ms... reads /etc/mtab 603 times, once every 499 ms avg 8192 bytes reads /proc/loadavg 603 times, once every 499 ms avg 8191 bytes reads /proc/meminfo 1206 times, once every 249 ms avg 8191 bytes reads /proc/mounts 3618 times, once every 83 ms avg 4096 bytes reads /proc/net/dev 603 times, once every 499 ms avg 8192 bytes reads /proc/partitions 3618 times, once every 83 ms avg 5461.33333333333 bytes reads /proc/stat 603 times, once every 499 ms avg 8191 bytes reads /proc/vmstat 603 times, once every 499 ms avg 8191 bytes reads /sys/block/dm-0/stat 603 times, once every 499 ms avg 8191 bytes reads /sys/block/dm-1/stat 603 times, once every 499 ms avg 8191 bytes reads /sys/block/dm-5/stat 603 times, once every 499 ms avg 8191 bytes reads /sys/block/sda/sda1/stat 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/collisions 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/rx_bytes 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/rx_errors 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/rx_packets 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/tx_bytes 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/tx_errors 603 times, once every 499 ms avg 8191 bytes reads /sys/class/net/eth0/statistics/tx_packets 603 times, once every 499 ms avg 8191 bytes [snip another 100 files for other network devices vif0, vif1, etc]
This machine is running Xen so there are a number of virtual interfaces – most of them are not active though. This doesn’t stop the applet for reading their tx & rx statistics once every 500ms. It is not entirely clear why it has to read the tx & rx data for each individual network device, since it only ever displays the aggregate total network traffic for the machine – which it has already obtained from /proc/net/dev.
4. xenstored – Xen metadata database
The xenstored process is a ‘special’ program – and I don’t mean that in a good way.
A few weeks back I was wondering why merely monitoring the runtime state of a handful of domains was causing Xenstored to consume so much CPU time. The long answer is archived, but the short story is that its implementation of transactions requires it to take a complete copy of the database backend file everytime it starts & ends a transaction. With XenD performing about 16 transactions every time you ask for info about a domain, if you monitor 20 guest domains once a second, this translates into xenstored doing 80 MB of disk I/O every second.
Pid 2831 xenstored run 301230+ ms... reads /var/lib/xenstored/tdb.0x9e1e9b8 404 times, once every 742 ms avg 196884 bytes reads /var/lib/xenstored/tdb.0x9e1f708 403 times, once every 744 ms avg 196884 bytes writes /var/lib/xenstored/tdb.0x9e1e9b8 404 times, once every 742 ms avg 73728 bytes writes /var/lib/xenstored/tdb.0x9e1f708 404 times, once every 742 ms avg 73728 bytes
Future…
Next task is to get this setup to run at boot again so we see just what files are contributing to I/O during startup. Also be useful to extend this to track ioctl() calls too, because just tracking read/write is certainly missing some interesting stuff.
Leave a Reply