Flame graph shows computer system performance in a whole new light


The person who rose to prominence on the internet for shouting at servers is now becoming famous for another, somewhat related feat, creating a new kind of data visualization to characterize system performance.

Brendan Gregg, chief performance engineer at cloud provider Joyent, has developed a visualization technique called a flame graph that can be effective in plotting how system resources such as processors and memory are being used. It was later picked up by a number of engineers who used it to improve popular diagnostic tools like DTrace and Windows XPerf.

Gregg explained how the flame graph works Thursday at the USENIX LISA (Large Installation System Administration) conference in Washington, DC. run slower than expected.

“We’ve had battery tracks for a long time, but what Brendan did gave us a really quick way to see aspects that weren’t easily visible before,” said a participant in the presentation, noting that graphics of flame would have come in handy for him at work during a recent dispute with a software company over a performance issue.

Perhaps the supplier could have solved the problem in a matter of hours using a flame graph rather than three weeks, he said.

Skill

Gregg’s expertise lies in the area of ​​system performance measurement. His book on the subject was published this year by Prentice Hall.

In 2008, Gregg, then an employee at Sun Microsystems, gained attention for showing how disk I / O could be slowed down by loud, sudden noises, a fact he demonstrated by shouting, loud enough, on a waiter. The resulting vibrations had slowed down the discs.

Joab jackson
Brendan Gregg

Gregg created a YouTube video to demonstrate latency heatmaps, a new type of visualization he created to plot system latency. The video has gone viral in the IT community.

The flame graphic appeared “under duress,” said Gregg. A customer had raised concerns about an application that was running around 40% slower than expected. To investigate the problem, Gregg had to sort through 500,000 rows of diagnostic data. He quickly realized that it was too much data to be easily understood.

Inspired by visualization guru Edward Tufte, Gregg thought about ways to visualize all of the data on one screen. What he proposed “merged and collapsed the common elements,” while preserving the relationship between the elements in the amount of resources they consumed.

What is a flame graph?

Flame charts, like the one displayed at the top of the story, are made up of multiple stacks of vertical bars, with each row of bars representing a time slice, with the bottom rows being the oldest and the top rows being the more recent. Each row can have multiple bars, with each bar representing a different function, and the length of each bar representing the percentage of resources that the function is using at that time.

For a flaming graph representing CPU usage, the top bars show which software functions were being performed at the time the data was captured.

Processor flame graphics are built on stack traces, which list all of the functions being performed by the processor at any given time. But the hierarchical presentation of the flame graph data encapsulates the flow of actions on a processor.

By examining a graph, an administrator can visually plot which functions are called by other functions. Analysis of different lines can reveal which functions of an individual program, or at a higher level, which of a number of programs running simultaneously on a machine, are consuming a disproportionate amount of processor attention.

Other flame graphics can be constructed to show how resources are divided in memory or with disk I / O.

The program Gregg created to render flame graphics consists of around 300 lines of Perl code to interpret the source data, plus a few Scalable Vector Graphics (SVG) functions for rendering graphics and JavaScript to add capabilities. mouse over web interface. .

Others have built programs that use flame graphics to visualize data created by popular performance tools, such as DTrace, Windows XPerf, OS X Instruments, Perl performance tools, and Google Chrome developer tools.

Gregg said that Dave Pacheco’s node.js implementation for DTrace could even become the canonical flame graph app, since it is more advanced than Gregg’s own program.

Beyond flame charts, Gregg is working on another visualization called Frequency Tracks, an R-based data renderer that shows the characteristics of outliers in a data set, which can be useful in determining performance issues. serious issues in cloud computing operations, he said.

Gregg is not a visual person by nature, he said in an interview after his presentation. He is more comfortable with the Unix command line. But the very nature of today’s large distributed systems requires visual aids.

“On a cloud, I need to understand 1,000 servers and I need to understand them right away. Visualization is necessary to do our job these days, ”he said.


Gordon K. Morehouse