DTrace

Posted: July 9th, 2004 | Filed under: Uncategorized | 1 Comment »

There has been a lot of noise of late around a new ‘killer’ feature in Solaris 10 called DTrace. Initially I viewed it with some scepticism, but increasily I’m coming to see the huge step forward this is for Solaris. Linux / UNIX has always had a plethora of debugging tools, available to admins

  • strace – trace system calls made by a process
  • ltrace – trace library function calls made by a process
  • mcheck – trace memory allocations
  • sar – monitor system resource instrumentation counters
  • ps – list processes
  • top – monitor processes
  • lsof – list open file handles per process
  • tcpdump – monitor network traffic
  • gdb – trace program execution
  • oprofile – monitor program execution via hardware CPU counters
  • iostat – monitor filesystem I/O
  • vmstat – monitor VM system operation

…but for any typical debugging task, a combination of one-or-more of these tools is required. Tieing together the datasets generated by each tool is possible (most stats are correlated by process ID), but alot of work for the programmer. When threads come into play correlating the data gets even harder. Another aspect of these tools is that they are intrusive to the operation of program being debugged. While a performance hit is acceptable in some cases, the way ptrace() syscall works on Linux can actually cause the programs to crash or behave in abnormal ways.

Enter DTrace. In a nutshell, DTrace provides a single tool that can monitor any aspect of the system, from across both kernel and userspace, system wide, per process, or somewhere in between. This provides an immediate win by removing the need for manual cross-correlation of data from a disperate set of tools. DTrace runtime defines a standard set of many thousands of probes covering all common system events. A simple C-like programming language (with addition of associative arrays and garbage collection) is provided for collecting and processing data when probes are fired. The language is highly optimized & written with safety as a top priority, such that running probes incurrs no overhead or risk to the system being monitored. This allows admins to be confident of using it to diagnose problems on critical production servers.

Enter Linux. So what can Linux provide to answer to DTrace ? In the event it turns out that there are a number of options available, although none quite as comprehensive as DTrace (yet!):

  • KProbes – provides an API for inserting probes at arbitrary machine instructions. When the instruction is hit a callback is invoked to perform any required processing & then control returns to the original instruction. An inserted, but inactive probe is a merely 2 instruction overhead.
  • DProbes – is a level higher up, using the KProbes functionality. It provides a simple RPN (Reverse Polish Notation) interpreter for creating probe callbacks, the key being safety and efficiency of the callbacks. At a higher level there is a mini-C language which compiles down to RHN. This is close to but not quite as powerful as D, since it lacks garbage collection & associative arrays.
  • LTT – is a general purpose Linux tracing toolkit. It can pull together data from a number of sources. One such source is a simple kernel module defining 40 or so events, another source can be DProbes.

There are a number of other tools available with varying degrees of coverage, but nothing really comes close to the polished-perfection of DTrace. Out of all of them, DProbes is probably the open source tool with the most potential, but even that needs more work:

  • Safety. Protection is needed to make it impossible for a probe to crash the program being traced
  • Ease-of-use. The mini-C language could do with a few higher level features like garbage collectioon, associative arrays, thread local storage
  • Documentation. Never underestimate the importance of through documentation (conversely the difficulty of creating good documentation!)
  • Probe library. A large library of pre-defined probe points that with stable API across kernel upgrades.

I for one hope that Linux community (& vendors supporting it) realize the value of a polished tool like DTrace and take prompt steps to close the gap to Solaris. For the interested here are more links

One Response to “DTrace”

Leave a Reply





Spam protection: Sum of n1ne plus t3n ?: