Flame Graphs in Rust on Linux

I’ve been working on a few toy Rust programs and libraries of late. One of these is dolby, an implementation of Adaptive Count-Min Sketch as detailed in this paper from Microsoft Research. CMS data structures are designed to deal with the problem of counting things at scale - when your incoming data stream is (essentially) infinite, how can you reason about the data using only finite resources? CMS algorithms and structures get around that by using probability - they can give you approximate answers.

Running an algorithm like this at scale is likely to amplify any inefficiencies in the implementation, and knocking off a few seconds, milliseconds, or even microseconds off the running time of a loop or other section of the program can yield massive rewards. And a great way to identify potential improvements or discovering problematic issues is with flame graphs.

(The following was all done on a Ubuntu 16.04 machine, but don’t worry! I’m not using any fancy BPF tricks or anything - as long as you can run a recent-ish perf on your machine (mine is 4.4.15), you should be able to replicate this fairly easily.)

In order to get useful information out of our profiling tools, we need to tell the Rust compiler (rustc) to include DWARF debugging symbols in the binary it is going to create. To do this, we add a small section to our cargo.toml file:

    [profile.release]
    debug = true

Once we have built the binary with cargo build, we can do a system-wide profile using perf. Here I’m running a test against 10 million entries being added to a dolby data structure:

    root# perf record -ag  ~/dolby/target/debug/dolby

(the -a flag ensures profiling capture on all CPUs, the -g flag switches on stack-chain recording, which is needed to build the flame graph structure)

After that has finished, we can use Brendan Gregg’s flame graph tools to convert the output into graphical form.

    perf script | ~/FlameGraph/stackcollapse-perf.pl | ~/FlameGraph/flamegraph.pl > rust.svg

Opening rust.svg in a browser gives us a fancy flame graph!

Rust.svg

Looking at the graph, it seems that the majority of the time dolby is on-CPU, it is running the insert method. Which makes sense, seeing as the test harness is firing 10m numbers into the structure! We can also see by walking up the chain that insert spends most of its time in the hash_index method, and most of the samples inside hash_index are actually while Rust is doing hashing calculations within the murmurhash3 crate. Changing the hash function to something less expensive than murmurhash3 may therefore result in improved insert performance.

Anyhow - flame graphs in Rust! They’re easy! Use them!