Creating UMD Module from Emscripten using CMake

By default, Emscripten creates a module which can be used from both Node.JS and the browser, but it has the following issues:

  1. The module pollutes the global namespace
  2. The module is created with the name Module (in my case, I require streamingPercentiles)
  3. The module cannot be loaded by some module loaders such as require.js

While the above issues can (mostly) be corrected by using –s MODULARIZE=1, it changes the semantics of the resulting JavaScript file, as the module now returns a function rather than an object. For example, code which previously read var x = new Module.Klass() would become var x = new Module().Klass(). I found this semantic change unacceptable, so I decided to abandon Emscripten’s -s MODULARIZE=1 option in favor of hand-crafting a UMD module.

I determined that the most appropriate pattern for my use case was the no dependencies pattern from UMD’s templates/returnExports.js. Applied to an Emscripten module, and using the default module name streamingPercentiles, the stanzas look like the following:

umdprefix.js:

(function (root, factory) {
    if (typeof define === 'function' && define.amd) {
        // AMD.  Register as an anonymous module.
        define([], factory);
    } else if (typeof module === 'object' && module.exports) {
        module.exports = factory();
    } else {
        // streamingPercentiles is the 'default' name of the module
        root.streamingPercentiles = factory();
    }
}(typeof self !== 'undefined' ? self : this, function () {

umdsuffix.js:

    return Module;
}));

While I might be able to use Emscripten’s --pre-js and --post-js‘s options to prepend and append the above JavaScript files, these options do not guarantee in all cases that the above JavaScript files will be first and last. Therefore, I decided to prepend and append the JavaScript manually.

As my build system is CMake based, I needed to change change the compilation process to generate an intermediate file streamingPercentiles-unwrapped.v1.js, and then use some CMake magic to prepend and append the above JavaScript files:

add_executable(streamingPercentiles-unwrapped.v1.js ${STMPCT_JS_SRC})

file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/concat.cmake "
file(WRITE \${DST} \"\")

file(READ \${SRC1} S1)
file(APPEND \${DST} \"\${S1}\")

file(READ \${SRC2} S2)
file(APPEND \${DST} \"\${S2}\")

file(READ \${SRC3} S3)
file(APPEND \${DST} \"\${S3}\")
")
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js
                   COMMAND ${CMAKE_COMMAND} -D SRC1=${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js
                                            -D SRC2=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js
                                            -D SRC3=${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js
                                            -D DST=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js
                                            -P ${CMAKE_CURRENT_BINARY_DIR}/concat.cmake
                   DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js ${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js)

With the above code, all of the original three issues are fixed without any semantic changes for users.

Reading Material for April 13, 2018

Reading Material for April 5, 2018

Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava

This is part 7 of my series on calculating percentiles on streaming data.

In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05].  This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes:

  1. Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile \phi and a maximum allowable error \epsilon.  The algorithm will maintain the summary data structure \mathsf{S} such that it will maintain each targeted quantile with the appropriate error rate.
  2. Simplified the Compress() operation by removing the concept of banding.

The paper also demonstrated that the new, more generalized algorithm could handle the following cases with ease:

  1. The uniform quantile case, where each quantile is maintained with identical allowable error \epsilon.
  2. The low-biased quantile case, where lower quantiles (i.e. as \phi approaches 0) are maintained with higher precision (lower maximum allowable error) than higher quantiles.
  3. The high-biased quantile case, where higher quantiles (i.e. as \phi approaches 1) are maintained with higher precision (lower maximum allowable error) than lower quantiles.

Visualizing the Behavior of CKMS

Similar to my post about Visualizing Greenwald-Khanna, let’s visualize the above three cases with CKMS.

Uniform Quantiles

ckms_uq_final

Low-Biased Quantiles

ckms_lbq_final

High-Biased Quantiles

ckms_hbq_final

It’s quite visually intuitive how the low-biased quantiles case keeps more, finer-grained measurements at low percentile values, and the high-biased quantiles case keeps more, finer-grained measurements at high percentile values.

Understanding the CKMS Algorithm

Much of the below will only make sense if you have read the paper, so I suggest trying to read [CKMS05] first, and then returning here.

The key to understanding the CKMS algorithm is understanding the invariant function f(r_i, n).  I found that the best way to understand f(r_i, n) is to understand the targeted quantiles invariant function (Definition 5 in [CKMS05]) and the error function f(\phi n, n) / n charts (Figure 2 in [CKMS05]).

Uniform Quantiles Invariant Function

Consider the uniform quantile case, where we want all quantiles to be maintained with identical allowable error \epsilon.  In this case, the targeted quantile tuples set will become \mathsf{T} = \{(\frac{1}{n}, \epsilon), (\frac{2}{n}, \epsilon), \ldots, (\frac{n-1}{n}, \epsilon), (1, \epsilon)\}.

Given the above definition, below is a visualization of the error function f(\phi n)/n) computed for \epsilon = 0.1 as n increases from 3 to 10:

anim

Visually, the solid polyline can be interpreted as the amount of error allowed at a given value of \phi; the higher the line, the more error allowed at that particular percentile.

Notice how the solid polyline is converging towards a horizontal line at f(\phi n, n)/n = 0.2, which exactly corresponds to the GK invariant function f(r_i, n) = 2 \epsilon n, as found in [GK01].

Low-Biased Quantiles Invariant Function

Now consider the low-biased quantile case, where we want lower quantiles to have higher precision than higher quantiles.  In this case, the targeted quantile tuples set will become \mathsf{T} = \{(\frac{1}{n}, \frac{\epsilon}{n}), (\frac{2}{n}, \frac{2\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{(n-1)\epsilon}{n}), (1, \epsilon)\}.

Given the above definition, below is a visualization of the error function f(\phi n)/n) computed for \epsilon = 0.1 as n increases from 3 to 10:

ckms_lbq

Notice how the solid polyline allows less error at lower percentiles and is converging towards the line f(\phi n, n)/n = 0.2\phi, which exactly corresponds to the low-biased quantiles invariant function f(r_i, n) = 2 \epsilon r_i, as found in Definition 3 of [CKMS05].

High-Biased Quantiles Invariant Function

Now consider the high-biased quantile case, where we want higher quantiles to have higher precision than lower quantiles.  In this case, the targeted quantile tuples set will become \mathsf{T} = \{(\frac{1}{n}, \epsilon), (\frac{2}{n}, \frac{(n-1)\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{2\epsilon}{n}), (1, \frac{\epsilon}{n})\}.

Given the above definition, below is a visualization of the error function f(\phi n)/n) computed for \epsilon = 0.1 as n increases from 3 to 10:

ckms_hbq

Notice how the solid polyline allows less error at higher percentiles and is converging towards the line f(\phi n, n)/n = 0.2(1 - \phi), which corresponds to the high-biased quantiles invariant function f(r_i, n) = 2 \epsilon (n - r_i) (not found in the [CKMS05] paper).

Availability in Streaming Percentiles Library

All four variants (uniform quantile, low-biased quantile, high-biased quantile, and targeted quantile) of the CKMS algorithm are available in my streaming percentiles library.  Here’s an example of what it’s like to use:

#include <stmpct/ckms_hbq.hpp> // High-biased quantiles

ckms_hbq c(0.01);
for (int i = 0; i < 1000; ++i)
    c.insert(rand());
double p50 = c.quantile(0.5); // Approx. median
double p95 = c.quantile(0.95); // Approx. 95th percentile

Or, from JavaScript:

var sp = require('streamingPercentiles.v1.min.js');
var c = new sp.CKMS_HBQ(0.1);
for (var i = 0; i < 1000; ++i)
    c.insert(Math.random());
var p50 = c.quantile(0.5); // Approx. median
var p95 = c.quantile(0.95); // Approx. 95th percentile

For more information, visit the streaming percentiles library GitHub page.

References

  • [GK01] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of ACM SIGMOD, pages 58–66, 2001.
  • [CKMS05] G. Cormode, F. Korn, S. Muthukrishnan and D. Srivastava. Effective Computation of Biased Quantiles over Data Streams. In Proceedings of the 21st International Conference on Data Engineering, pages 20-31, 2005.
%d bloggers like this: