Data-Driven Code Generation of Unit Tests Part 5: Closing Thoughts

This post is part 5/5 of my Data-Driven Code Generation of Unit Tests series.

In the previous posts in this series, I walked through the idea of performing data-driven code generation for unit tests, as well as how I implemented it in three different programming languages and build systems. This post contains some final thoughts about the effort.

Was it worth it?

Almost certainly. Although it required substantial up-front effort to set up the unit test generators, this approach found numerous, previously-undetected bugs both within my implementation of the calculation library as well as with legacy implementations. It is straightforward to write code generators that test all possible combinations of parameters to the calculations, ensuring that the resulting code coverage is excellent. Adding tests for a new calculation is as straightforward as adding a line to a single file.

Which build system was easiest for integrating code generation?

Visual Studio/MSBuild (it is basically out of the box)
Maven
CMake

Which templating language was the best?

Jinja2/T4 (tied)
StringTemplate (a distant 3rd; I would strongly consider evaluating alternative templating languages for generating Java code)

What’s next?

Code generation opens up a vast number of possibilities for future enhancements. The existing code generators could be improved to only generate code when something changes in order to improve compilation times. More unit tests could be defined within the code generator templates to test invalid parameters, NaNs, etc. Binding libraries (e.g. wrapping the Java calculation library in a set of Spark SQL user-defined aggregates, or the C++ library into a set of PostgreSQL user-defined aggregates) can all be code generated from the same metadata.csv (more on this later).