Flame Graphs With Spark

I couldn’t sleep last night, so after I gave up lying in bed at 6:58am, I decided to write something that I’ve been meaning to create for a while.

Hence: spark-flame. A way of quickly obtaining flame graphs on Apache Spark worker nodes. It’s a short Ansible playbook that takes a YARN application id, does some terrible things with ps to get the java process ids for the executors, run perf, get the JVM symbols using perf-map-agent, generate the flame graphs and copy them back to your local machine.

The steps the playbook takes are detailed in Brendan Gregg’s Java In Flames blog. All it really does is follow those steps whilst making sure that the workers have a copy of the perf-map-agent libraries and finally copying the resulting flame graphs back down to your local box.

There’s a couple of options to tweak; you can alter the length of the perf sample recording, and you can change the options to perf itself (currently it’s set to -ag -F 997, so it’ll sample at 997Hz across all CPUs and generate call graph stacks). But that’s about it!

And as a bonus, here’s a sample SVG from a worker doing a map over 10m doubles: