Unit Testing Emscripten Library in Browser Using CMake and Nightwatch.JS

logo-nightwatchIn a previous blog post, I described how I took Emscripten-created JS and turned it into an UMD module.  One of the reasons I did this was because I wanted more control over the generated JavaScript and for it to be usable in more contexts, such as with the RequireJS module loader.

As I am a responsible developer, I desired to create a number of automated unit tests to ensure that the client-visible API for my Emscripten module works as I intended.  I began by searching for a browser automated test framework and settled upon Nightwatch.js.  Now I just had to figure out how to get Nightwatch.js tests running in my existing, CMake-based build system.  Here’s how I did it.

Configurig Nightwatch.JS

In order to use Nightwatch.JS, you must first configure it by creating a file called nightwatch.json. The first major decision you need to make is which WebDriver-implementing system you wish to use. Most users use Selenium, but you can also run an individual browser driver directly.

As I was not concerned with cross-browser compatibility — I assume that if the test works on one browser it will work on all major browsers — and I was looking for a system with a minimum number of build-time dependencies, I decided to use ChromeDriver automatically as my WebDriver implementation.

To make everything work, I did the following:

1. To automatically download chromedriver, add the following to CMakeLists.txt:

# Install chromedriver
  OUTPUT node_modules/chromedriver/package.json
  COMMAND npm install chromedriver
  chromedriver ALL
  DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/node_modules/chromedriver/package.json

2. To configure Nightwatch.JS to use chromedriver, create a nightwatch.json which looks like this (the purpose of nightwatch-globals.js will become clear shortly):

    "globals_path": "nightwatch-globals.js",
    "selenium" : {
        "start_process" : false
    "test_settings" : {
        "default" : {
            "selenium_host": "localhost",
            "selenium_port": 9515,
            "default_path_prefix": "",
            "desiredCapabilities": {
                "browserName": "chrome",
                "chromeOptions" : {
                    "args" : ["--no-sandbox"]
                "acceptSslCerts": true

3. To start and stop chromedriver when running tests, create a nightwatch-globals.js which looks like this:

var chromedriver = require('chromedriver');

module.exports = {
    before: function(done) {
    after: function(done) {

4. CMake will run the unit tests from ${CMAKE_CURRENT_BINARY_DIR}, so we’ll need to copy the above config files to ${CMAKE_CURRENT_BINARY_DIR}. Here’s how to do that:

# Copy nightwatch config files to target directory
  nightwatch.json ALL

  OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js
  COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_SOURCE_DIR}/nightwatch-globals.js ${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js
  DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/nightwatch-globals.js
  nightwatch-globals.js ALL
  DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/nightwatch-globals.js

Automatically Download Nightwatch.JS and Run Unit Tests

1. First, we need to create a Nightwatch.JS unit test. Here’s a sample test case from the Nightwatch.JS home page:

// google.js
module.exports = {
  'Demo test Google' : function (browser) {
      .waitForElementVisible('body', 1000)
      .setValue('input[type=text]', 'nightwatch')
      .waitForElementVisible('button[name=btnG]', 1000)
      .assert.containsText('#main', 'Night Watch')

2. To automatically download the Nightwatch.JS library, add the following lines to CMakeLists.txt:

# Install nightwatch
  OUTPUT node_modules/nightwatch/package.json
  COMMAND npm install nightwatch
  nightwatch ALL
  DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/node_modules/nightwatch/package.json

3. To run the above unit test as a CMake unit test, add the following lines to CMakeLists.txt:

    NAME nightwatch_test
    COMMAND ./node_modules/nightwatch/bin/nightwatch -t ${CMAKE_CURRENT_SOURCE_DIR}/google.js

You may want to separate your tests into multiple JavaScript files and execute them independently. Here’s one way to do that from CMake:

file(GLOB TESTCASE_SRC tests/*.js)
foreach (testPath ${TESTCASE_SRC})
  get_filename_component(testName ${testPath} NAME_WE)

  # Test all unit tests
    NAME browser_${testName}
    COMMAND ./node_modules/nightwatch/bin/nightwatch -t ${testPath}

Using Local Web Server when Running Test Cases

In certain cases, your unit tests be able to refer to local file: URLs, but things tend to be a lot easier if your unit tests reference URLs from a local web server. It’s really easy to get one up and running:

1. Download Node’s http-server module by adding the following to your CMakeLists.txt

# Install http-server
  OUTPUT node_modules/http-server/package.json
  COMMAND npm install http-server
  http-server ALL
  DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/node_modules/http-server/package.json

2. Modify nightwatch-globals.js to start and stop the web server as part of the tests:

var chromedriver = require('chromedriver');
var http = require('http-server');

module.exports = {
    before: function(done) {
        this.server = http.createServer();
    after: function(done) {

Once this is done, your tests can refer to http://localhost:8080.

Note that http-server reads files from the current working directory, and CMake runs unit tests from ${CMAKE_CURRENT_BINARY_DIR}, so you may need to copy your test HTML and JavaScript to ${CMAKE_CURRENT_BINARY_DIR}. Here’s some CMake code which you might find helpful:

# Copy all .HTML files to binary directory
file(GLOB HTML_SRC *.html)
foreach (htmlPath ${HTML_SRC})
  get_filename_component(htmlFileName ${htmlPath} NAME)

  # Copy HTML to binary folder so they can be referred to by the test
    browser_copy_${htmlFileName} ALL

For a real-life, working example of all this in action, see the source code for my streaming percentiles library.

Creating UMD Module from Emscripten using CMake

By default, Emscripten creates a module which can be used from both Node.JS and the browser, but it has the following issues:

  1. The module pollutes the global namespace
  2. The module is created with the name Module (in my case, I require streamingPercentiles)
  3. The module cannot be loaded by some module loaders such as require.js

While the above issues can (mostly) be corrected by using –s MODULARIZE=1, it changes the semantics of the resulting JavaScript file, as the module now returns a function rather than an object. For example, code which previously read var x = new Module.Klass() would become var x = new Module().Klass(). I found this semantic change unacceptable, so I decided to abandon Emscripten’s -s MODULARIZE=1 option in favor of hand-crafting a UMD module.

I determined that the most appropriate pattern for my use case was the no dependencies pattern from UMD’s templates/returnExports.js. Applied to an Emscripten module, and using the default module name streamingPercentiles, the stanzas look like the following:


(function (root, factory) {
    if (typeof define === 'function' && define.amd) {
        // AMD.  Register as an anonymous module.
        define([], factory);
    } else if (typeof module === 'object' && module.exports) {
        module.exports = factory();
    } else {
        // streamingPercentiles is the 'default' name of the module
        root.streamingPercentiles = factory();
}(typeof self !== 'undefined' ? self : this, function () {


    return Module;

While I might be able to use Emscripten’s --pre-js and --post-js‘s options to prepend and append the above JavaScript files, these options do not guarantee in all cases that the above JavaScript files will be first and last. Therefore, I decided to prepend and append the JavaScript manually.

As my build system is CMake based, I needed to change change the compilation process to generate an intermediate file streamingPercentiles-unwrapped.v1.js, and then use some CMake magic to prepend and append the above JavaScript files:

add_executable(streamingPercentiles-unwrapped.v1.js ${STMPCT_JS_SRC})

file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/concat.cmake "
file(WRITE \${DST} \"\")

file(READ \${SRC1} S1)
file(APPEND \${DST} \"\${S1}\")

file(READ \${SRC2} S2)
file(APPEND \${DST} \"\${S2}\")

file(READ \${SRC3} S3)
file(APPEND \${DST} \"\${S3}\")
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js
                   COMMAND ${CMAKE_COMMAND} -D SRC1=${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js
                                            -D SRC2=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js
                                            -D SRC3=${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js
                                            -D DST=${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles.v1.js
                                            -P ${CMAKE_CURRENT_BINARY_DIR}/concat.cmake
                   DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/umdprefix.js ${CMAKE_CURRENT_BINARY_DIR}/streamingPercentiles-unwrapped.v1.js ${CMAKE_CURRENT_SOURCE_DIR}/umdsuffix.js)

With the above code, all of the original three issues are fixed without any semantic changes for users.

Visualizing Latency Part 4: Official D3 Latency Heatmap Page

This post is part 4 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems.

Allow me to wrap up my visualizing latency post series by noting that my official D3 latency heatmap repository is at https://github.com/sengelha/d3-latency-heatmap/. Monitor this repository for future developments to the D3 latency heatmap chart.

Visualizing Latency Part 3: Rendering Event Data

This post is part 3 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems.

Now that I have introduced the D3 latency heatmap chart component and explained what binning is, I can discuss the primary use case of the chart: rendering event data.

What is event data?

First, I must explain what I mean by event data. For a fuller treatment, please read Analytics For Hackers: How To Think About Event Data, but allow me to summarize: Event data describes actions performed by entities. It has three key pieces of information: action, timestamp, and state. It is typically rich, denormalized, nested, schemaless, append-only, and frequently extremely large. Some examples of event data include system log records, financial market trades, crime records, or user activities within an application.

When I created the D3 latency heatmap chart component, my primary use case was to be able to visualize the latency of a queue-based report generation system. This system logs every single report generation event, along with important data such as start time and duration, to a table inside a SQL server. I imagine there are thousands (or millions) of systems storing event data into SQL tables — and there’s absolutely nothing wrong with that — but event data is also frequently stored in append-only files on filesystems, in stream processing systems like Apache Kafka, or in distributed databases like Apache Cassandra.

When rendering event data, the key decisions are around binning:

  1. What sizes should the bins have?
  2. How should binning be implemented?
  3. Where should binning be performed?

What sizes should the bins have?

This question was discussed in my blog post where I explained what binning is. The short answer it “it depends on your chart size and data distribution.” As you create your chart, be prepared to try out multiple different binning sizes.

How should binning be implemented?

Let’s explore a few common alternatives for implementing binning.


The process of binning is conceptually equivalent to the process of a SQL GROUP BY, which leads to one possible implementation: in SQL itself.

Consider a table of events with the following (slightly unrealistic) definition:

CREATE TABLE dbo.events(
    dt datetime NOT NULL,
    val float NOT NULL

Let’s say we want to bin the x-axis (time) by day, and the y-axis (value) into buckets of 10 (i.e. a bin for values from 0-10, another from 10-20, etc.) Here’s one way to write that query in SQL:

       FLOOR(val / 10) * 10 AS yval,
       COUNT(*) AS count
FROM dbo.events
GROUP BY CAST(dt AS DATE), FLOOR(val / 10) * 10

Given the above query, changing the size of the bins is simple and straightforward.

How fast is this query? I measured the above query on my development laptop in Microsoft SQL Server 2016 with a table with 1 billion records on it, as well as various candidate optimizations, below:

Test Case Duration (seconds) Throughput (records/second) Comments
Base case 10.23 97.7 million
Use of COUNT(1) instead of COUNT(*) (statistically indistinguishable) (statistically indistinguishable) As expected, COUNT(1) has no performance impact.
Data sorted by dt then val (CREATE CLUSTERED INDEX IX_events_dt_val ON dbo.events(dt, val))) (statistically indistinguishable) (statistically indistinguishable) I was hoping to eliminate the need for sorts in the query, but perhaps the query optimizer can’t tell that the group by expressions are order-preserving.
Columnstore table (CREATE CLUSTERED COLUMNSTORE INDEX) 3.238 308.8 million Reads are much faster but writes might be slower.
Memory-optimized table (CREATE TABLE … WITH (MEMORY_OPTIMIZED=ON)) (insufficient memory) N/A Apparently 16GB allocated to SQL Server isn’t enough.
Memory-optimized table with 10,000,000 records 1.052 9.5 million Throughput is highly suspicious but memory-optimized tables are generally optimized for OLTP not analytics.

Other possibilities which I did not test include:

  • Different GROUP BY expressions to see if they are more efficient or the optimizer can determine that they are order-preserving
  • An event table that stores date and time in separate columns, as a data warehouse might
  • Memory optimized tables with columnstore indexes
  • Memory optimized tables with natively compiled queries

Conclusion: Don’t spend too much time on query optimization, but consider storing your events in a columnstore table rather than a rowstore table.

Imperative (Traditional) Programming Languages

Another approach to binning is to do it in an imperative programming language (e.g. JavaScript, C#, Java, etc.) Here are a few potential avenues to explore:

  1. Accumulate counts in a dictionary/map/hash table which is keyed by (xval, yval).
  2. Using something like LINQ and expressing the query in a form that’s similar to the SQL query above.
  3. If the data is already properly ordered, and your group by expressions are order-preserving, use this information to be able to process the data piece-by-piece rather than having to keep everything all in RAM.

I’m personally intrigued by #3, as I doubt most general-purpose analytical systems or libraries use this optimization technique.

Stream Processing Systems

Another possible place where binning could be performed would be within a stream processing system. Imagine a near-realtime D3 latency chart rendering binned data from Apache Flink cluster which is continuously aggregating and binning raw event data from a Kafka event bus, or from an Amazon Kinesis Data Analytics SQL query.

Where should binning be performed?

My recommendations are as follows:

  1. If you’re dealing with streaming data, do it in the streaming analytics system (Apache Flink, etc.)
  2. If your data set is small, and you want to support responsive dynamic rebinning of the data (e.g. a dropdown where the user can select whether they want to bin the x-axis by day, month, quarter, or year), pull the raw event data into the web browser and perform the binning there. Pay careful attention to the amount of data you are transferring over the network and the amount of RAM you require. Consider presorting the events by date when retrieving the data from your data store.
  3. If your data set is medium-sized, or you want to use third-party libraries, pull the raw event data into the web server and perform the binning there. Network bandwidth and RAM usage are far less of a concern here than within a web browser, but you must still mind them. Consider presorting the events by date when retrieving the data from your data store.
  4. If your data set is very large, consider binning inside the data storage system (e.g. the SQL server) as part of data retrieval (a.k.a. move compute to data not data to compute).

What do I mean by “small”? The answer is, of course, “it depends”, but here’s how I try to break it down. Let’s say we want to be able to render the chart in 5 seconds or less, and that we’ll allocate 2.5 seconds to downloading the data and 2.5 seconds to the JavaScript execution and rendering. If we estimate the target (95th percentile) user has an Internet connection of 1MB/sec, we can transfer no more than 2.5MB of data. If we transfer the data compressed, and the compression achieves a 10:1 ratio, this is 25MB of raw data, which I imagine shouldn’t be a problem to store in RAM. If a single event is 100 bytes uncompressed, we can transfer no more than 250,000 events to the browser.

Naturally, the above recommendations do not take into account all possible considerations. For example, if you pull the raw event data into a web browser, you may now have to solve a new problem: How to keep the data in the web browser up-to-date? This problem doesn’t exist if you perform the binning where the data resides. On the other hand, if you bin inside your data storage system, what additional I/O, CPU, or cache pressure will you add to the server, and how will this interact with the existing utilization profile of the storage system? As with everything, it’s all about tradeoffs.

Visualizing Latency Part 2: What is Binning?

This post is part 2 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems.

As mentioned in Brendan Gregg’s Latency Heat Maps page, a latency heat map is a visualization where each column of data is a histogram of the observations for that time interval. Using Brendan Gregg’s visualization:

As with histograms, the key decision that needs to be made when using a latency heat map is how to bin the data. Binning is the process of dividing the entire range of values into a series of intervals and then counting how many values fall into each interval. That said, there is no “best” number of bins, and different bin sizes can reveal different features of the data. Ultimately, the number of bins depends on the distribution of your data set as well as the size of the rendering area.

With latency heatmaps, binning often must be performed twice: once for the x-axis (time) and once for the y-axis (interval of observed values).

Allow me to demonstrate this visually.  Here is an Excel file with the historical daily stock price of GE, courtesy of Yahoo! Finance. I have rendered the close prices in D3 Latency Heatmap with four different binning strategies:

12 vertical bins 30 vertical bins
Bin by year-month
Bin by year

As you can see, each chart shows a slightly different perspective.

You may find you need to experiment with multiple binning strategies until you arrive at a latency heatmap chart with the appropriate level of detail for your use case.

Visualizing Latency Part 1: D3 Latency Heatmap

This post is part 1 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems.

A latency heatmap is a particularly useful tool for visualizing latency. For a great treatment of latency heatmaps, please read Brendan Gregg’s Latency Heat Maps page and the ACM Queue article Visualizing System Latency.

On the right, you can see a latency heatmap generated from a job queueing system which shows a number of interesting properties, not least of which is that the system appears to be getting slower over time.

In order to make creating latency heatmaps easier, I decided to create a reusable D3 latency heatmap chart component. The goal of this component is to handle all the hard work of chart rendering on behalf of the user, so that a user needs to do little more than combine the chart component with their raw data on a web page. Additionally, animating the chart is quite straightforward (see github.com/sengelha/d3-latency-heatmap/samples/animated-heatmap.html for an example).

My D3 latency heatmap chart component is open source and available on GitHub at https://github.com/sengelha/d3-latency-heatmap.

Creating this chart required me to overcome a number of interesting challenges, such as:

  • How to create a reusable D3 chart component? (Ultimately I based my code on Mike Bostock’s Towards Reusable Charts proposal)
  • How to effectively use D3 scales for rendering non-points
  • How to correctly use D3’s .data(), .enter(), and .exit() to support in-place updates (required for animation)

Feel free to reach out to me with any questions or suggestions!

Data-Driven Code Generation of Unit Tests Part 5: Closing Thoughts

This post is part 5 of my series about data-driven code generation of unit tests.

In the previous posts in this series, I walked through the idea of performing data-driven code generation for unit tests, as well as how I implemented it in three different programming languages and build systems.  This post contains some final thoughts about the effort.

Was it worth it?
Almost certainly.  Although it required substantial up-front effort to set up the unit test generators, this approach found numerous, previously-undetected bugs both within my implementation of the calculation library as well as with legacy implementations. It is straightforward to write code generators that test all possible combinations of parameters to the calculations, ensuring that the resulting code coverage is excellent. Adding tests for a new calculation is as straightforward as adding a line to a single file.

Which build system was easiest for integrating code generation?

  1. Visual Studio/MSBuild (it is basically out of the box)
  2. Maven
  3. CMake

Which templating language was the best?

  1. Jinja2/T4 (tied)
  2. StringTemplate (a distant 3rd; I would strongly consider evaluating alternative templating languages for generating Java code)

What’s next?
Code generation opens up a vast number of possibilities for future enhancements. The existing code generators could be improved to only generate code when something changes in order to improve compilation times. More unit tests could be defined within the code generator templates to test invalid parameters, NaNs, etc. Binding libraries (e.g. wrapping the Java calculation library in a set of Spark SQL user-defined aggregates, or the C++ library into a set of PostgreSQL user-defined aggregates) can all be code generated from the same metadata.csv (more on this later).