Visualizing Latency Part 2: What is Binning?

As mentioned in Brendan Gregg’s Latency Heat Maps page, a latency heat map is a visualization where each column of data is a histogram of the observations for that time interval. Using Brendan Gregg’s visualization:

As with histograms, the key decision that needs to be made when using a latency heat map is how to bin the data. Binning is the process of dividing the entire range of values into a series of intervals and then counting how many values fall into each interval. That said, there is no “best” number of bins, and different bin sizes can reveal different features of the data. Ultimately, the number of bins depends on the distribution of your data set as well as the size of the rendering area.

With latency heatmaps, binning often must be performed twice: once for the x-axis (time) and once for the y-axis (interval of observed values).

Allow me to demonstrate this visually.  Here is an Excel file with the historical daily stock price of GE, courtesy of Yahoo! Finance. I have rendered the close prices in D3 Latency Heatmap with four different binning strategies:

12 vertical bins 30 vertical bins
Bin by year-month
Bin by year

As you can see, each chart shows a slightly different perspective.

You may find you need to experiment with multiple binning strategies until you arrive at a latency heatmap chart with the appropriate level of detail for your use case.

Advertisements

Visualizing Latency Part 1: D3 Latency Heatmap

This blog post is the first in a series about how to visualize latency, which is very useful for debugging certain classes of performance problems.

A latency heatmap is a particularly useful tool for visualizing latency. For a great treatment of latency heatmaps, please read Brendan Gregg’s Latency Heat Maps page and the ACM Queue article Visualizing System Latency.

On the right, you can see a latency heatmap generated from a job queueing system which shows a number of interesting properties, not least of which is that the system appears to be getting slower over time.

In order to make creating latency heatmaps easier, I decided to create a reusable D3 latency heatmap chart component. The goal of this component is to handle all the hard work of chart rendering on behalf of the user, so that a user needs to do little more than combine the chart component with their raw data on a web page. Additionally, animating the chart is quite straightforward (see github.com/sengelha/d3-latency-heatmap/samples/example2.html for an example).

My D3 latency heatmap chart component is open source and available on GitHub at https://github.com/sengelha/d3-latency-heatmap.

Creating this chart required me to overcome a number of interesting challenges, such as:

  • How to create a reusable D3 chart component? (Ultimately I based my code on Mike Bostock’s Towards Reusable Charts proposal)
  • How to effectively use D3 scales for rendering non-points
  • How to correctly use D3’s .data(), .enter(), and .exit() to support in-place updates (required for animation)

Feel free to reach out to me with any questions or suggestions!