Calculating Percentiles on Streaming Data Part 6: Building a C++ and JavaScript Library from a Single Codebase

This is part 6 of my series on calculating percentiles on streaming data.

For the past 10 days or so, I’ve been working on the build process of my C++ and JavaScript streaming analytics libraries. Using the magic of Emscripten, I have been able to combine both libraries into a single, C++ codebase, from which I can compile both the C++ and JavaScript versions of the library. Furthermore, I was able to do this without breaking backwards compatibility of the JavaScript library. I also made a number of other improvements to the compilation process, such as providing precompiled binaries and supporting both shared and static libraries.

This blog post will discuss how the consolidated build process works. If you’d like to follow along using the source code, its available at https://github.com/sengelha/streaming-percentiles-cpp.

Overview of Build System

As the streaming analytics library is intended to work across multiple operating systems, I use CMake for the core of the build system. There are plenty of good blog articles about using CMake for generating cross-platform C++ libraries, so I don’t need to say much more about that here.

As invoking the full build process properly requires multiple steps, I created the scripts build.sh (for Linux / Mac OS X) and build.bat (for Windows) to simplify the process. These scripts are responsible for performing the build, running the unit tests, and creating the binary packages. The entire build process is performed into a directory outside of the source tree, which is considered a CMake best practice. The scripts support various command-line options (e.g. build.sh --release for a release build, build.sh --clean for a clean build) to allow a developer fine-grained control on the build process; use --help to view information on how to use them.

Shared and Static Libraries

The CMake project supports building both static and shared versions of the streaming analytics library. For Linux and Mac OS X, it’s as simple as having two CMake targets: add_library(xxx_shared SHARED ${SRC}) and add_library(xxx_static STATIC ${SRC}).

However, for Windows, this isn’t quite enough. In order to properly build a Windows shared library, all exposed classes, functions, etc. must be marked with __declspec(dllexport). There’s an entire CMake Wiki page on how to do it. I haven’t gotten around to doing that yet, so shared library support on Windows is currently disabled. To avoid a future file name conflict for Windows builds, I gave the static version of the library a different name than the shared version of the library (stmpcts.lib instead of stmpct.lib).

I also found that on non-x86 systems, even static versions of the library need to be built with position-independent code, which I enabled with set (CMAKE_POSITION_INDEPENDENT_CODE TRUE).

C++ Unit Testing

For C++ unit tests, I decided upon the Boost.Test library. All C++ unit tests are compiled into a single executable which is then tested using CTest.

In order to test both the static and shared versions of the library, this executable is actually built twice: once linked against the static version of the library, the other linked against the shared version.

Using Emscripten from CMake to Cross-Compile C++ to JavaScript

Emscripten is some incredibly powerful wizardry. Basically you can point nearly any C++ at it, and it will cross-compile it to JavaScript.

Connecting Emscripten to CMake is quite easy. First you need a script to detect the emscripten compiler:

# FindEmscripten.cmake
# - try to find emscripten binary
#
# Variables you might use in your CMakeLists.txt:
#  EMSCRIPTEN_FOUND

find_program(EMSCRIPTEN_CPP_BINARY
             NAMES em++)
mark_as_advanced(EMSCRIPTEN_CPP_BINARY)

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(emscripten
    DEFAULT_MSG
    EMSCRIPTEN_CPP_BINARY)

Then it’s as simple as the following:

set(CMAKE_CXX_COMPILER em++)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --bind -std=c++11 -O3 --memory-init-file 0")
add_executable(xxx.js ${SRC})

I added a few more niceties, such as:

  • Used embind to naturally expose the classes in JavaScript
  • Added JavaScript unit tests using Mocha and unit.js
  • Added an uglify-js step to the build process to create a minified version of the built JavaScript

For more details, see the source code, particularly the js/ directory.

Packaging

For distribution convenience, I use CPack for creating the final packages of compiled code. Using CPack makes doing this nice and easy; simply set some variables (e.g. set(CPACK_PACKAGE_NAME "streaming_percentiles")), add include(CPack), and mark the files which should be included in the package with install(...).

For Windows, I prefer generating a ZIP package instead of a MSI; enabling this is as easy as set(CPACK_GENERATOR "ZIP").

Wrapping Up

There are a few more tricks in the codebase, like combining add_custom_command(OUTPUT xyz) with add_custom_target(xxx ALL DEPENDS xyz) to avoid excessive rebuilds, that you can discover for yourself. Just check out the source code for the project at https://github.com/sengelha/streaming-percentiles-cpp/.

If you’re only interested in the library itself, you can download pre-built binaries for Mac OS X / Linux / Windows / JavaScript from https://github.com/sengelha/streaming-percentiles-cpp/releases.

About Steven Engelhardt, CFA, AIF
Adjunct Professor of Software Engineering at DePaul University • Software Engineering, Data & Analytics in FinTech • Lives in Chicago, IL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s