Data-Driven Code Generation of Unit Tests Part 2: C++, CMake, Jinja2, Boost

This blog post explains how I used CMake, Jinja2, and the Boost Unit Test framework to perform data-driven code generation of unit tests for a financial performance analytics library.  If you haven’t read it already, I recommend starting with Part 1: Background.

All performance analytics metadata is stored in a single metadata file called metadata.csv.  This file contains the complete list of calculations, and for each calculation, its settings (i.e. how it differs from other calculations), including properties like:

  1. How many parameters does the calculation take (1, 2, or 3)?
  2. Does the calculation have an online (streaming) implementation?
  3. Does the calculation support annualization?
  4. What is the default annualization mode?
  5. Given a predefined set of inputs, what are the expected values of the calculation for various combinations of time period, annualization, etc.

The file looks something like:

algorithm_type,function_name,num_parameters,minimum_arr_size,supports_streaming,supports_annualization,default_annualization,expected_value_unannualized,expected_value_annualized_daily,expected_value_annualized_weekly,expected_value_annualized_monthly,expected_value_annualized_quarterly,expected_value_annualized_semiannually,expected_value_annualized_daily_200_day_year
absolute_statistics,calculation1,1,1,true,false,never,7.283238516,-999,-999,-999,-999,-999,-999
...
relative_statistics,calculation2,3,1,true,true,always,0.189846006,69.34125385,9.871992334,2.278152077,0.759384026,0.379692013,37.96920129
...

I use CSV rather than JSON or YAML because it can be easily read by CMake during the build process (more below).

A Jinja2 template defines all unit tests for a given calculation.  It uses the attributes found in metadata.csv to determine how to generate the appropriate source code.  For example, if the calculation does not support annualization per the supports_annualization flag, the Jinja2 template will ignore (not generate) the unit tests which test annualization support.

Each calculation has a number of possible combinations to test for, such as:

  1. Test the online vs. offline versions of the calculation
  2. Test the various annualization settings (always, never, calculation default)
  3. Test the various pre-defined annualization periods (daily, weekly, monthly, etc.)
  4. etc.

The Jinja2 template uses for loops extensively to make sure that it tests all possible combinations of all of the above parameters. It looks something like:

{% for calc_type in calc_types %}
{% for annualize in annualizes %}
{% for frequency in frequencies %}

BOOST_AUTO_TEST_CASE(test_{{ function_name }}_{{calc_type}}_annualize_{{annualize}}_frequency_{{frequency}})
{
    ....
}

{% endfor %}
{% endfor %}
{% endfor %}

As you can imagine, the resulting code coverage of the unit tests is excellent.

A Python script, render_jinja.py, knows how to read metadata.csv and pass the appropriate values to Jinja2 in order to generate the unit tests for a given function.  The meat of the Python script looks like:

function_name = ...
output_file = ...
template_file = ...

with open('../../metadata.csv', 'r') as f:
    mr = csv.DictReader(row for row in f if not row.startswith('#'))
    for row in mr:
        if row['function_name'] == function_name:
            fn_metadata = row
            break

# Generate unit test template
env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'), trim_blocks=True)
template = env.get_template(template_file)
result = template.render(fn_metadata)
output_file.write(result)

The build system uses CMake.  It too reads metadata.csv to get a list of calculations, calls render_jinja.py on each calculation to generate the unit test code C++ file, and then compiles and executes the unit tests. Here’s a sample of the CMake build file:

cmake_minimum_required(VERSION 2.8)
project(perfanalytics-cpp-test)

enable_testing()

if (WIN32)
  add_definitions(-DBOOST_ALL_NO_LIB)
  set(Boost_USE_STATIC_LIBS ON)
else()
  add_definitions(-DBOOST_TEST_DYN_LINK)
endif()
find_package(Boost COMPONENTS unit_test_framework REQUIRED)

set(TEST_COMMON_SRC memory_stream.cpp)

# Populate CALC_NAMES from metadata.csv
file(STRINGS ${CMAKE_CURRENT_SOURCE_DIR}/metadata.csv CALC_METADATA)
set(index 1)
list(LENGTH CALC_METADATA COUNT)
while(index LESS COUNT)
  list(GET CALC_METADATA ${index} line)

  if (NOT "${line}" MATCHES "^#")
    # convert line to a CMake list
    string(REPLACE "," ";" l ${line})
    list(GET l 1 calc_name)
    list(GET l 3 supports_streaming)
    list(APPEND CALC_NAMES ${calc_name})
    list(APPEND CALC_SUPPORTS_STREAMING ${supports_streaming})
  endif()

  math(EXPR index "${index}+1")
endwhile(index LESS COUNT)

# Note how we generate source into the binary directory.  This
# is important -- generated source is *output*, not source,
# and should not be checked into source control.
foreach(fn ${CALC_NAMES})
  add_custom_command(
    OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/generated/${fn}_unit_test.cpp
    COMMAND python ${CMAKE_CURRENT_SOURCE_DIR}/render_jinja.py -o ${CMAKE_CURRENT_BINARY_DIR}/generated/${fn}_unit_test.cpp -f ${fn} -t unit_test_template.cpp.j2
    DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/render_jinja.py ${CMAKE_CURRENT_SOURCE_DIR}/unit_test_template.cpp.j2 ${CMAKE_CURRENT_SOURCE_DIR}/../../metadata.csv
    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
    COMMENT "Generating C++ unit test ${CMAKE_CURRENT_BINARY_DIR}/generated/${fn}_unit_test.cpp"
  )
  list(APPEND TESTCASE_SRC ${CMAKE_CURRENT_BINARY_DIR}/generated/${fn}_unit_test.cpp)
endforeach()

foreach (testSrc ${TESTCASE_SRC})
  get_filename_component(testName ${testSrc} NAME_WE)

  # Test against static library
  add_executable(cpp_static_${testName} ${testSrc} ${TEST_COMMON_SRC})
  target_link_libraries(cpp_static_${testName} perfanalytics_cpp_static ${Boost_LIBRARIES})
  add_test(NAME cpp_static_${testName} COMMAND cpp_static_${testName})

  # Test against shared library
  if (BUILD_SHARED_LIBRARY)
    add_executable(cpp_shared_${testName} ${testSrc} ${TEST_COMMON_SRC})
    target_link_libraries(cpp_shared_${testName} perfanalytics_cpp_shared ${Boost_LIBRARIES})
    add_test(NAME cpp_shared_${testName} COMMAND cpp_shared_${testName})
  endif()
endforeach(testSrc)

A single script, build.sh, ties everything together.  While the full build.sh supports a number of command-line options (e.g. -c, --clean for a clean build; -d, --debug for a debug build; -r, --release for a release build), the core of the script looks like:

set BUILD_TYPE=Debug # or Release
if [ ! -d $BUILD_TYPE ]; then mkdir $BUILD_TYPE; fi
cd $BUILD_TYPE
cmake .. -DCMAKE_BUILD_TYPE=$BUILD_TYPE
cmake --build . --config $BUILD_TYPE
env CTEST_OUTPUT_ON_FAILURE=1 ctest -C $BUILD_TYPE
cpack -C $BUILD_TYPE

Windows uses an equivalent script called build.cmd.

I am quite happy with the results.  Adding a new calculation is almost as simple as writing the implementation of the calculation and adding a single line to metadata.csv. The unit tests are comprehensive and provide great code coverage.  New test patterns (e.g. what should happen if you pass in NULL to a calculation?) can be added to all calculations at once, simply by editing the Jinja2 template file. Everything works across Windows, Mac OS, and Linux.

The only remaining frustration that I have is that the build system will often re-generate the unit test source code, and recompile the unit tests, even though nothing has changed. This notably slows down build times.  I’m hopeful this can be solved with some further work on the CMake build file, but I’ll leave that for another time.

Advertisements

Data-Driven Code Generation of Unit Tests Part 1: Background

At Morningstar, I created a multi-language, cross-platform performance analytics library which implements both online and offline implementations of a number of common financial analytics such as Alpha, Beta, R-Squared, Sharpe Ratio, Sortino Ratio, and Treynor Ratio (more on this library later).  The library relies almost exclusively on a comprehensive suite of automated unit tests to validate its correctness.  I quickly found that maintaining a nearly-identical battery of unit tests in three different programming languages was a chore, and I had a hunch that I could use a common technique to deal with this problem: code generation.

The basic ideas behind the approach are quite straightforward. The first idea is one of language independence — a given calculation, given a known set of inputs, must produce the same output (allowing for rounding error), regardless of programming language. Therefore, a unit test for the implementation of Alpha in C# should be nearly identical in function (and remarkably similar in form) to a unit test for the implementation of Alpha in Java. Perhaps this means that we don’t need to write the unit test twice; we can have the computer perform the translation for us.

The second idea is one of calculation similarity.  Financial performance analytics tend to follow a common pattern: they all take in one to three streams of returns (security, benchmark, risk-free rate); they are almost all aggregate functions; most (but not all) can be implemented in both online and offline forms; and many support annualization.  The code for the unit test for Beta looks remarkably like the code for the unit test for Alpha; the only significant difference is the expected result. Therefore, if we can encode only the differences among the calculations (e.g. their expected results) in some sort of data file, perhaps we can use code generation for the vast majority of the unit tests for the calculation library.

My hunch paid off. In the end, I had a single CSV file which contained all the important differences among the calculations (e.g. their expected values). The build process uses this CSV file to code generate the entire unit test framework in C++ (using CMake, Jinja2, and the Boost Unit Test Framework), Java (using Apache Maven, StringTemplate, and JUnit), and C# (using MSBuild, T4 Text Templates, and the Microsoft Unit Test Framework for Managed Code). I was guaranteed that every single calculation in every language produces the same result given the same input. I found language-specific bugs (typically typos) in the performance analytics library. I found language-specific bugs in previously-existing libraries at Morningstar (fortunately these were niche languages that weren’t actively used in products). I learned a lot about differences in templating systems for code generation (Jinja2 and T4 were pleasant; StringTemplate was much less so) and using code generation in build systems (Maven is a real pain; SBT is probably a lot nicer).  Furthermore, I was able to use the same metadata file and code generation tools to power binding and wrapper libraries around the code performance analytics library (more on this later).

Future posts in this series will explain how I implemented data-driven code generation of unit tests in each of the above programming languages.

I’d love to hear feedback from you if you found this useful, or other places where you’ve applied similar techniques!