Using SystemTAP for dynamically instrumented the Linux kernel
It has been just over a year & 1/2 since I first blogged about DTrace suggesting that a similar tool would be very valuable to the Linux community. Well after a few long email threads, it turned out that a significant number of people within Red Hat agreed with this assessment and so in partnership with IBM and Intel the SystemTAP project came into life at the start of 2005. Starting with the previously developed KProbes dynamic instrumentation capability a huge amount of work has been done building out a high level language and runtime for safely, efficiently & reliably probing the kernel. It has seen a limited ‘technology preview’ in RHEL-4, and with its inclusion in the forthcoming Fedora Core 5 it will be exposed to a much wider community of users & potential developers.
On the very same day as Dave Jones was looking at the Fedora boot process via static kernel instrumentation, I was (completely co-incidentally ) playing around using SystemTAP to instrument the boot process. The probe I wrote looked at file opens, process fork/execve to enable a hierarchical view of startup to be pieced together. A simplified version of the script looked like:
global indent function timestamp() { return string(gettimeofday_ms()) . indent[pid()] . " " } function proc() { return string(pid()) . " (" . execname() . ")" } function push(pid, ppid) { indent[pid] = indent[ppid] . " " } function pop(pid) { delete indent[pid] } probe kernel.function("sys_clone").return { print(timestamp() . proc() . " forks " . string(retval()). "\n") push(retval(), pid()) } probe kernel.function("do_execve") { print(timestamp() . proc() . " execs " . kernel_string($filename) . "\n") } probe kernel.function("sys_open") { if ($flags & 1) { print(timestamp() . proc() . " writes " . user_string($filename) . "\n") } else { print(timestamp() . proc() . " reads " . user_string($filename) . "\n") } } probe kernel.function("do_exit") { print(timestamp() . proc() . " exit\n") pop(pid()) }
A few tricks later it was running during boot, and having analysed the results with Perl one can display a summary of how many files each init script opened
1 init read 90 write 30 running... 251 init read 29 write 0 run 23.08s 252 rc.sysinit read 1035 write 45 run 22.91s 274 start_udev read 355 write 128 run 15.10s 286 start_udev read 91 write 0 run 1.90s 287 MAKEDEV read 91 write 0 run 1.88s 291 udevstart read 177 write 124 run 3.95s 614 usleep read 2 write 0 run 1.05s 649 udev-stw.modules read 84 write 5 run 1.23s 701 dmraid read 111 write 0 run 1.07s 748 rc read 235 write 13 run 14.57s 753 S10network read 111 write 16 run 2.85s 833 S12syslog read 44 write 3 run 0.43s 844 S25netfs read 87 write 1 run 1.51s 861 S55cups read 31 write 2 run 1.70s 878 S55sshd read 52 write 1 run 0.86s 892 S97messagebus read 31 write 2 run 0.44s 900 S98NetworkManager read 96 write 10 run 0.58s 910 S98NetworkManagerDispatcher read 92 write 3 run 0.67s 921 S98avahi-daemon read 29 write 0 run 0.41s 929 S98haldaemon read 31 write 2 run 4.20s 955 S99local read 17 write 1 run 0.16s
There are so many other interesting ways to analyse the data collected at boot which I don’t have space for in this blog, so I’ve put all the information (including how to run SystemTAP during boot) up on my Red Hat homepage