# Moved Link Aggregation to Bitly

I have decided to move my link aggregation efforts, such as they are, to Bitly, and to leave this blog for original content.  Follow me on Twitter if you’re interested in tracking these links in the future.

# Unit Testing Emscripten Library in Browser Using CMake and Nightwatch.JS

In a previous blog post, I described how I took Emscripten-created JS and turned it into an UMD module.  One of the reasons I did this was because I wanted more control over the generated JavaScript and for it to be usable in more contexts, such as with the RequireJS module loader.

As I am a responsible developer, I desired to create a number of automated unit tests to ensure that the client-visible API for my Emscripten module works as I intended.  I began by searching for a browser automated test framework and settled upon Nightwatch.js.  Now I just had to figure out how to get Nightwatch.js tests running in my existing, CMake-based build system.  Here’s how I did it.

### Configurig Nightwatch.JS

In order to use Nightwatch.JS, you must first configure it by creating a file called nightwatch.json. The first major decision you need to make is which WebDriver-implementing system you wish to use. Most users use Selenium, but you can also run an individual browser driver directly.

As I was not concerned with cross-browser compatibility — I assume that if the test works on one browser it will work on all major browsers — and I was looking for a system with a minimum number of build-time dependencies, I decided to use ChromeDriver automatically as my WebDriver implementation.

To make everything work, I did the following:

1. To automatically download chromedriver, add the following to CMakeLists.txt:

# Install chromedriver
OUTPUT node_modules/chromedriver/package.json
COMMAND npm install chromedriver
)
chromedriver ALL
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/node_modules/chromedriver/package.json )  2. To configure Nightwatch.JS to use chromedriver, create a nightwatch.json which looks like this (the purpose of nightwatch-globals.js will become clear shortly): { "globals_path": "nightwatch-globals.js", "selenium" : { "start_process" : false }, "test_settings" : { "default" : { "selenium_host": "localhost", "selenium_port": 9515, "default_path_prefix": "", "desiredCapabilities": { "browserName": "chrome", "chromeOptions" : { "args" : ["--no-sandbox"] }, "acceptSslCerts": true } } } }  3. To start and stop chromedriver when running tests, create a nightwatch-globals.js which looks like this: var chromedriver = require('chromedriver'); module.exports = { before: function(done) { chromedriver.start(); done(); }, after: function(done) { chromedriver.stop(); done(); } };  4. CMake will run the unit tests from ${CMAKE_CURRENT_BINARY_DIR}, so we’ll need to copy the above config files to ${CMAKE_CURRENT_BINARY_DIR}. Here’s how to do that: # Copy nightwatch config files to target directory add_custom_command( OUTPUT${CMAKE_CURRENT_BINARY_DIR}/nightwatch.json
COMMAND ${CMAKE_COMMAND} -E copy${CMAKE_CURRENT_SOURCE_DIR}/nightwatch.json ${CMAKE_CURRENT_BINARY_DIR}/nightwatch.json DEPENDS${CMAKE_CURRENT_SOURCE_DIR}/nightwatch.json
)
nightwatch.json ALL
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/nightwatch.json ) add_custom_command( OUTPUT${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js
COMMAND ${CMAKE_COMMAND} -E copy${CMAKE_CURRENT_SOURCE_DIR}/nightwatch-globals.js ${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js DEPENDS${CMAKE_CURRENT_SOURCE_DIR}/nightwatch-globals.js
)
nightwatch-globals.js ALL
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js )  ### Automatically Download Nightwatch.JS and Run Unit Tests 1. First, we need to create a Nightwatch.JS unit test. Here’s a sample test case from the Nightwatch.JS home page: // google.js module.exports = { 'Demo test Google' : function (browser) { browser .url('http://www.google.com') .waitForElementVisible('body', 1000) .setValue('input[type=text]', 'nightwatch') .waitForElementVisible('button[name=btnG]', 1000) .click('button[name=btnG]') .pause(1000) .assert.containsText('#main', 'Night Watch') .end(); } };  2. To automatically download the Nightwatch.JS library, add the following lines to CMakeLists.txt: # Install nightwatch add_custom_command( OUTPUT node_modules/nightwatch/package.json COMMAND npm install nightwatch ) add_custom_target( nightwatch ALL DEPENDS${CMAKE_CURRENT_BINARY_DIR}/node_modules/nightwatch/package.json
)


3. To run the above unit test as a CMake unit test, add the following lines to CMakeLists.txt:

add_test(
NAME nightwatch_test
COMMAND ./node_modules/nightwatch/bin/nightwatch -t ${CMAKE_CURRENT_SOURCE_DIR}/google.js )  You may want to separate your tests into multiple JavaScript files and execute them independently. Here’s one way to do that from CMake: file(GLOB TESTCASE_SRC tests/*.js) foreach (testPath${TESTCASE_SRC})
get_filename_component(testName ${testPath} NAME_WE) # Test all unit tests add_test( NAME browser_${testName}
COMMAND ./node_modules/nightwatch/bin/nightwatch -t ${testPath} ) endforeach()  ### Using Local Web Server when Running Test Cases In certain cases, your unit tests be able to refer to local file: URLs, but things tend to be a lot easier if your unit tests reference URLs from a local web server. It’s really easy to get one up and running: 1. Download Node’s http-server module by adding the following to your CMakeLists.txt # Install http-server add_custom_command( OUTPUT node_modules/http-server/package.json COMMAND npm install http-server ) add_custom_target( http-server ALL DEPENDS${CMAKE_CURRENT_BINARY_DIR}/node_modules/http-server/package.json
)


2. Modify nightwatch-globals.js to start and stop the web server as part of the tests:

var chromedriver = require('chromedriver');
var http = require('http-server');

module.exports = {
before: function(done) {
this.server = http.createServer();
this.server.listen(8080);
chromedriver.start();
done();
},
after: function(done) {
this.server.close();
chromedriver.stop();
done();
}
};


Once this is done, your tests can refer to http://localhost:8080.

Note that http-server reads files from the current working directory, and CMake runs unit tests from ${CMAKE_CURRENT_BINARY_DIR}, so you may need to copy your test HTML and JavaScript to ${CMAKE_CURRENT_BINARY_DIR}. Here’s some CMake code which you might find helpful:

# Copy all .HTML files to binary directory
file(GLOB HTML_SRC *.html)
foreach (htmlPath ${HTML_SRC}) get_filename_component(htmlFileName${htmlPath} NAME)

# Copy HTML to binary folder so they can be referred to by the test
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${htmlFileName}
COMMAND ${CMAKE_COMMAND} -E copy${CMAKE_CURRENT_SOURCE_DIR}/${htmlFileName}${CMAKE_CURRENT_BINARY_DIR}/${htmlFileName} DEPENDS${CMAKE_CURRENT_SOURCE_DIR}/${htmlFileName} ) add_custom_target( browser_copy_${htmlFileName} ALL
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${htmlFileName}
)
endforeach()


For a real-life, working example of all this in action, see the source code for my streaming percentiles library.

# Creating UMD Module from Emscripten using CMake

By default, Emscripten creates a module which can be used from both Node.JS and the browser, but it has the following issues:

1. The module pollutes the global namespace
2. The module is created with the name Module (in my case, I require streamingPercentiles)
3. The module cannot be loaded by some module loaders such as require.js

While the above issues can (mostly) be corrected by using –s MODULARIZE=1, it changes the semantics of the resulting JavaScript file, as the module now returns a function rather than an object. For example, code which previously read var x = new Module.Klass() would become var x = new Module().Klass(). I found this semantic change unacceptable, so I decided to abandon Emscripten’s -s MODULARIZE=1 option in favor of hand-crafting a UMD module.

I determined that the most appropriate pattern for my use case was the no dependencies pattern from UMD’s templates/returnExports.js. Applied to an Emscripten module, and using the default module name streamingPercentiles, the stanzas look like the following:

umdprefix.js:

(function (root, factory) {
if (typeof define === 'function' && define.amd) {
// AMD.  Register as an anonymous module.
define([], factory);
} else if (typeof module === 'object' && module.exports) {
module.exports = factory();
} else {
// streamingPercentiles is the 'default' name of the module
root.streamingPercentiles = factory();
}
}(typeof self !== 'undefined' ? self : this, function () {


umdsuffix.js:

    return Module;
}));


While I might be able to use Emscripten’s --pre-js and --post-js‘s options to prepend and append the above JavaScript files, these options do not guarantee in all cases that the above JavaScript files will be first and last. Therefore, I decided to prepend and append the JavaScript manually.

As my build system is CMake based, I needed to change change the compilation process to generate an intermediate file streamingPercentiles-unwrapped.v1.js, and then use some CMake magic to prepend and append the above JavaScript files:

add_executable(streamingPercentiles-unwrapped.v1.js ${STMPCT_JS_SRC}) file(WRITE${CMAKE_CURRENT_BINARY_DIR}/concat.cmake "
file(WRITE \${DST} \"\") file(READ \${SRC1} S1)
file(APPEND \${DST} \"\${S1}\")

file(READ \${SRC2} S2) file(APPEND \${DST} \"\${S2}\") file(READ \${SRC3} S3)
file(APPEND \${DST} \"\${S3}\")
")
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js COMMAND${CMAKE_COMMAND} -D SRC1=${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js -D SRC2=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js
-D SRC3=${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js -D DST=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js
-P ${CMAKE_CURRENT_BINARY_DIR}/concat.cmake DEPENDS${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js)


With the above code, all of the original three issues are fixed without any semantic changes for users.

# Board Games for Family Night

Playing board games is a major hobby of mine.  Every year I look forward to attending Gen Con, where I aggressively shop the dings and dents section of CoolStuffInc and play as many new games as I can.

We enjoy playing board games as a family.  I’ve been playing board games with my daughter since she was four; she’s eight now, and quite the gamer.  One of the things we’ve learned is that the recommended age range on a board game is merely a recommendation: the only real challenge for my daughter in playing advanced games was how well she was able to read.

I thought I’d share some of the games that we like to play together as a family.

#### Strategy Game

Winner: Tzolk’in: The Mayan Calendar
Runners-Up: The Castles of Burgundy, 7 Wonders, Kingsburg, Tikal
Want to try soon: Hansa Teutonica

In most strategy games, the theme is far less important than the mechanics, and Tzolk’in has my favorite game mechanic of all time:  Workers placed on six interconnected gears rotate to take the workers to different action spots.  Winning the game is all about placing and removing your workers at optimal times.

The game is beautiful, and there are many different ways to win the game.  My daughter, in her first time out and without any hints, soundly beat my wife and I with the Chichen Itza strategy.

#### Cooperative Game

Winner: Ghost Stories
Want to try soon: Pandemic Legacy: Season 1, Robinson Crusoe: Adventures on the Cursed Island

This is my wife’s favorite category.  We recently played Ghost Stories for the first time as a family, and while we lost (of course), we had a great time and we can’t wait to play again … once my daughter gets over her frustration of losing.  Now that I think about it, it might take a while before we can bring this game back to the table.

#### One vs. Everyone Else Game

In any one-vs-everyone game, I tend to be the “one”.  When we played Fury of Dracula, it was no different – I was the famous vampire, trying to hide across Europe.  It was fascinating watching my daughter trying to track me down and seeing through my feints.  I thought I had her for a while, but she caught on eventually.  Once she did, I became cornered without any chance for escape.

I find it curious that all three examples here are from the horror genre…

#### Kid’s Choice

Winner: Steam Park
Runners-Up: Mysterium, Concept, Tokaido, The Downfall of Pompeii

My least-favorite category!

Every time my daughter gets free choice as to which game we’ll play, she invariably chooses Steam Park.  She absolutely loves the theme and the idea of building rollercoasters for people to ride.  Also, she tends to win – by a lot!

I don’t personally enjoy Steam Park or Mysterium all that much, so when it’s kid’s choice night, I cross my fingers and hope my daughter ends up picking Concept, Tokaido, or The Downfall of Pompeii.  Especially the latter: I enjoy watching my daughter getting excited about throwing my meeples into the volcano.

#### Duds

Winner: All racing games (e.g. Formula D)
Runners-Up: Galaxy Trucker, Blokus

These games, for whatever reason, just didn’t work with us.  We tend to find racing games like Formula D boring; there’s just not enough going on there to keep our interest.  My daughter was completely turned off by the theme of Galaxy Trucker, and Blokus had absolutely no replayability whatsoever.

# Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava

This is part 7 of my series on calculating percentiles on streaming data.

In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05].  This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes:

1. Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile $\phi$ and a maximum allowable error $\epsilon$.  The algorithm will maintain the summary data structure $\mathsf{S}$ such that it will maintain each targeted quantile with the appropriate error rate.
2. Simplified the Compress() operation by removing the concept of banding.

The paper also demonstrated that the new, more generalized algorithm could handle the following cases with ease:

1. The uniform quantile case, where each quantile is maintained with identical allowable error $\epsilon$.
2. The low-biased quantile case, where lower quantiles (i.e. as $\phi$ approaches 0) are maintained with higher precision (lower maximum allowable error) than higher quantiles.
3. The high-biased quantile case, where higher quantiles (i.e. as $\phi$ approaches 1) are maintained with higher precision (lower maximum allowable error) than lower quantiles.

#### Visualizing the Behavior of CKMS

Similar to my post about Visualizing Greenwald-Khanna, let’s visualize the above three cases with CKMS.

##### High-Biased Quantiles

It’s quite visually intuitive how the low-biased quantiles case keeps more, finer-grained measurements at low percentile values, and the high-biased quantiles case keeps more, finer-grained measurements at high percentile values.

#### Understanding the CKMS Algorithm

Much of the below will only make sense if you have read the paper, so I suggest trying to read [CKMS05] first, and then returning here.

The key to understanding the CKMS algorithm is understanding the invariant function $f(r_i, n)$.  I found that the best way to understand $f(r_i, n)$ is to understand the targeted quantiles invariant function (Definition 5 in [CKMS05]) and the error function $f(\phi n, n) / n$ charts (Figure 2 in [CKMS05]).

#### Uniform Quantiles Invariant Function

Consider the uniform quantile case, where we want all quantiles to be maintained with identical allowable error $\epsilon$.  In this case, the targeted quantile tuples set will become $\mathsf{T} = \{(\frac{1}{n}, \epsilon), (\frac{2}{n}, \epsilon), \ldots, (\frac{n-1}{n}, \epsilon), (1, \epsilon)\}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Visually, the solid polyline can be interpreted as the amount of error allowed at a given value of $\phi$; the higher the line, the more error allowed at that particular percentile.

Notice how the solid polyline is converging towards a horizontal line at $f(\phi n, n)/n = 0.2$, which exactly corresponds to the GK invariant function $f(r_i, n) = 2 \epsilon n$, as found in [GK01].

#### Low-Biased Quantiles Invariant Function

Now consider the low-biased quantile case, where we want lower quantiles to have higher precision than higher quantiles.  In this case, the targeted quantile tuples set will become $\mathsf{T} = \{(\frac{1}{n}, \frac{\epsilon}{n}), (\frac{2}{n}, \frac{2\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{(n-1)\epsilon}{n}), (1, \epsilon)\}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Notice how the solid polyline allows less error at lower percentiles and is converging towards the line $f(\phi n, n)/n = 0.2\phi$, which exactly corresponds to the low-biased quantiles invariant function $f(r_i, n) = 2 \epsilon r_i$, as found in Definition 3 of [CKMS05].

#### High-Biased Quantiles Invariant Function

Now consider the high-biased quantile case, where we want higher quantiles to have higher precision than lower quantiles.  In this case, the targeted quantile tuples set will become $\mathsf{T} = \{(\frac{1}{n}, \epsilon), (\frac{2}{n}, \frac{(n-1)\epsilon}{n}), \ldots, (\frac{n-1}{n}, \frac{2\epsilon}{n}), (1, \frac{\epsilon}{n})\}$.

Given the above definition, below is a visualization of the error function $f(\phi n)/n)$ computed for $\epsilon$ = 0.1 as $n$ increases from 3 to 10:

Notice how the solid polyline allows less error at higher percentiles and is converging towards the line $f(\phi n, n)/n = 0.2(1 - \phi)$, which corresponds to the high-biased quantiles invariant function $f(r_i, n) = 2 \epsilon (n - r_i)$ (not found in the [CKMS05] paper).

#### Availability in Streaming Percentiles Library

All four variants (uniform quantile, low-biased quantile, high-biased quantile, and targeted quantile) of the CKMS algorithm are available in my streaming percentiles library.  Here’s an example of what it’s like to use:

#include <stmpct/ckms_hbq.hpp> // High-biased quantiles

ckms_hbq c(0.01);
for (int i = 0; i < 1000; ++i)
c.insert(rand());
double p50 = c.quantile(0.5); // Approx. median
double p95 = c.quantile(0.95); // Approx. 95th percentile


Or, from JavaScript:

var sp = require('streamingPercentiles.v1.min.js');
var c = new sp.CKMS_HBQ(0.1);
for (var i = 0; i < 1000; ++i)
c.insert(Math.random());
var p50 = c.quantile(0.5); // Approx. median
var p95 = c.quantile(0.95); // Approx. 95th percentile


#### References

• [GK01] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of ACM SIGMOD, pages 58–66, 2001.
• [CKMS05] G. Cormode, F. Korn, S. Muthukrishnan and D. Srivastava. Effective Computation of Biased Quantiles over Data Streams. In Proceedings of the 21st International Conference on Data Engineering, pages 20-31, 2005.

# Calculating Percentiles on Streaming Data Part 6: Building a C++ and JavaScript Library from a Single Codebase

This is part 6 of my series on calculating percentiles on streaming data.

For the past 10 days or so, I’ve been working on the build process of my C++ and JavaScript streaming analytics libraries. Using the magic of Emscripten, I have been able to combine both libraries into a single, C++ codebase, from which I can compile both the C++ and JavaScript versions of the library. Furthermore, I was able to do this without breaking backwards compatibility of the JavaScript library. I also made a number of other improvements to the compilation process, such as providing precompiled binaries and supporting both shared and static libraries.

This blog post will discuss how the consolidated build process works. If you’d like to follow along using the source code, its available at https://github.com/sengelha/streaming-percentiles-cpp.

#### Overview of Build System

As the streaming analytics library is intended to work across multiple operating systems, I use CMake for the core of the build system. There are plenty of good blog articles about using CMake for generating cross-platform C++ libraries, so I don’t need to say much more about that here.

As invoking the full build process properly requires multiple steps, I created the scripts build.sh (for Linux / Mac OS X) and build.bat (for Windows) to simplify the process. These scripts are responsible for performing the build, running the unit tests, and creating the binary packages. The entire build process is performed into a directory outside of the source tree, which is considered a CMake best practice. The scripts support various command-line options (e.g. build.sh --release for a release build, build.sh --clean for a clean build) to allow a developer fine-grained control on the build process; use --help to view information on how to use them.

#### Shared and Static Libraries

The CMake project supports building both static and shared versions of the streaming analytics library. For Linux and Mac OS X, it’s as simple as having two CMake targets: add_library(xxx_shared SHARED ${SRC}) and add_library(xxx_static STATIC${SRC}).

However, for Windows, this isn’t quite enough. In order to properly build a Windows shared library, all exposed classes, functions, etc. must be marked with __declspec(dllexport). There’s an entire CMake Wiki page on how to do it. I haven’t gotten around to doing that yet, so shared library support on Windows is currently disabled. To avoid a future file name conflict for Windows builds, I gave the static version of the library a different name than the shared version of the library (stmpcts.lib instead of stmpct.lib).

I also found that on non-x86 systems, even static versions of the library need to be built with position-independent code, which I enabled with set (CMAKE_POSITION_INDEPENDENT_CODE TRUE).

#### C++ Unit Testing

For C++ unit tests, I decided upon the Boost.Test library. All C++ unit tests are compiled into a single executable which is then tested using CTest.

In order to test both the static and shared versions of the library, this executable is actually built twice: once linked against the static version of the library, the other linked against the shared version.

#### Using Emscripten from CMake to Cross-Compile C++ to JavaScript

Emscripten is some incredibly powerful wizardry. Basically you can point nearly any C++ at it, and it will cross-compile it to JavaScript.

Connecting Emscripten to CMake is quite easy. First you need a script to detect the emscripten compiler:

# FindEmscripten.cmake
# - try to find emscripten binary
#
# Variables you might use in your CMakeLists.txt:
#  EMSCRIPTEN_FOUND

find_program(EMSCRIPTEN_CPP_BINARY
NAMES em++)

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(emscripten
DEFAULT_MSG
EMSCRIPTEN_CPP_BINARY)


Then it’s as simple as the following:

set(CMAKE_CXX_COMPILER em++)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --bind -std=c++11 -O3 --memory-init-file 0") add_executable(xxx.js${SRC})


I added a few more niceties, such as:

• Used embind to naturally expose the classes in JavaScript
• Added JavaScript unit tests using Mocha and unit.js
• Added an uglify-js step to the build process to create a minified version of the built JavaScript

For more details, see the source code, particularly the js/ directory.

#### Packaging

For distribution convenience, I use CPack for creating the final packages of compiled code. Using CPack makes doing this nice and easy; simply set some variables (e.g. set(CPACK_PACKAGE_NAME "streaming_percentiles")), add include(CPack), and mark the files which should be included in the package with install(...).

For Windows, I prefer generating a ZIP package instead of a MSI; enabling this is as easy as set(CPACK_GENERATOR "ZIP").

#### Wrapping Up

There are a few more tricks in the codebase, like combining add_custom_command(OUTPUT xyz) with add_custom_target(xxx ALL DEPENDS xyz) to avoid excessive rebuilds, that you can discover for yourself. Just check out the source code for the project at https://github.com/sengelha/streaming-percentiles-cpp/.

If you’re only interested in the library itself, you can download pre-built binaries for Mac OS X / Linux / Windows / JavaScript from https://github.com/sengelha/streaming-percentiles-cpp/releases.

# Calculating Percentiles on Streaming Data Part 5: C++ Library

This is part 5 of my series on calculating percentiles on streaming data.

I have created a reusable C++ library which contains my implementation of the streaming percentile algorithms found within this blog post and published it to GitHub. Here’s what using it looks like:

#include <stmpct/gk.hpp>

using namespace stmpct;

double epsilon = 0.1;
gk g(epsilon);
for (int i = 0; i < 1000; ++i)
g.insert(rand());
double p50 = g.quantile(0.5); // Approx. median
double p95 = g.quantile(0.95); // Approx. 95th percentile


You can find it here: