Calculating Percentiles on Streaming Data Part 7: CormodeKornMuthukrishnanSrivastava
March 29, 2018 Leave a comment
This is part 7 of my series on calculating percentiles on streaming data.
In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05]. This paper took the GreenwaldKhanna algorithm [GK01] and made the following notable changes:

Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile and a maximum allowable error . The algorithm will maintain the summary data structure such that it will maintain each targeted quantile with the appropriate error rate.

Simplified the
Compress()
operation by removing the concept of banding.
The paper also demonstrated that the new, more generalized algorithm could handle the following cases with ease:
 The uniform quantile case, where each quantile is maintained with identical allowable error .
 The lowbiased quantile case, where lower quantiles (i.e. as approaches 0) are maintained with higher precision (lower maximum allowable error) than higher quantiles.
 The highbiased quantile case, where higher quantiles (i.e. as approaches 1) are maintained with higher precision (lower maximum allowable error) than lower quantiles.
Visualizing the Behavior of CKMS
Similar to my post about Visualizing GreenwaldKhanna, let’s visualize the above three cases with CKMS.
Uniform Quantiles
LowBiased Quantiles
HighBiased Quantiles
It’s quite visually intuitive how the lowbiased quantiles case keeps more, finergrained measurements at low percentile values, and the highbiased quantiles case keeps more, finergrained measurements at high percentile values.
Understanding the CKMS Algorithm
Much of the below will only make sense if you have read the paper, so I suggest trying to read [CKMS05] first, and then returning here.
The key to understanding the CKMS algorithm is understanding the invariant function . I found that the best way to understand is to understand the targeted quantiles invariant function (Definition 5 in [CKMS05]) and the error function charts (Figure 2 in [CKMS05]).
Uniform Quantiles Invariant Function
Consider the uniform quantile case, where we want all quantiles to be maintained with identical allowable error . In this case, the targeted quantile tuples set will become .
Given the above definition, below is a visualization of the error function computed for = 0.1 as increases from 3 to 10:
Visually, the solid polyline can be interpreted as the amount of error allowed at a given value of ; the higher the line, the more error allowed at that particular percentile.
Notice how the solid polyline is converging towards a horizontal line at , which exactly corresponds to the GK invariant function , as found in [GK01].
LowBiased Quantiles Invariant Function
Now consider the lowbiased quantile case, where we want lower quantiles to have higher precision than higher quantiles. In this case, the targeted quantile tuples set will become .
Given the above definition, below is a visualization of the error function computed for = 0.1 as increases from 3 to 10:
Notice how the solid polyline allows less error at lower percentiles and is converging towards the line , which exactly corresponds to the lowbiased quantiles invariant function , as found in Definition 3 of [CKMS05].
HighBiased Quantiles Invariant Function
Now consider the highbiased quantile case, where we want higher quantiles to have higher precision than lower quantiles. In this case, the targeted quantile tuples set will become .
Given the above definition, below is a visualization of the error function computed for = 0.1 as increases from 3 to 10:
Notice how the solid polyline allows less error at higher percentiles and is converging towards the line , which corresponds to the highbiased quantiles invariant function (not found in the [CKMS05] paper).
Availability in Streaming Percentiles Library
All four variants (uniform quantile, lowbiased quantile, highbiased quantile, and targeted quantile) of the CKMS algorithm are available in my streaming percentiles library. Here’s an example of what it’s like to use:
#include <stmpct/ckms_hbq.hpp> // Highbiased quantiles ckms_hbq c(0.01); for (int i = 0; i < 1000; ++i) c.insert(rand()); double p50 = c.quantile(0.5); // Approx. median double p95 = c.quantile(0.95); // Approx. 95th percentile
Or, from JavaScript:
var sp = require('streamingPercentiles.v1.min.js'); var c = new sp.CKMS_HBQ(0.1); for (var i = 0; i < 1000; ++i) c.insert(Math.random()); var p50 = c.quantile(0.5); // Approx. median var p95 = c.quantile(0.95); // Approx. 95th percentile
For more information, visit the streaming percentiles library GitHub page.
References
 [GK01] M. Greenwald and S. Khanna. Spaceefficient online computation of quantile summaries. In Proceedings of ACM SIGMOD, pages 58–66, 2001.
 [CKMS05] G. Cormode, F. Korn, S. Muthukrishnan and D. Srivastava. Effective Computation of Biased Quantiles over Data Streams. In Proceedings of the 21st International Conference on Data Engineering, pages 2031, 2005.