When writing Bazel tests using sh_test(), I often find myself needing to compare
two collections for equivalence. For example, I might compare a directory listing
against a set of expected files or directories, or the list of files and directories
in a .tar file against a set of expected items. This blog post provides some tips
and tricks as to how to do so.
The basic approach to solving this problem is simple: put the expected and the actual
sets into Bash arrays and then compare the contents. The trick is in the details.
Defining Expected Values
The trivial case is define your set of expected values, which you can do by defining
the Bash array inline:
To read in the actual values, use the Bash builtin readarray with the stdout
from a program. Be sure to trim off any extraneous or unwanted output.
For example:
To read in a set of expected values from a separate text file:
1
readarray -t ACTUAL_VALUES < source_file.txt
To read in a set of files in a directory recursively:
1
readarray -t ACTUAL_VALUES < <(cd$DIR&& find . -type f | sed -e 's#^\./##')
The program comm can be used to compare the actual and expected arrays
together. The main thing to remember is that the input to comm must be
sorted, which is trivial to handle using sort.
Typically I’ll report “missing” from “extra” values separately as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ERR=0MISSING_VALUES=$(comm -13 <(printf'%s\n'"${ACTUAL_VALUES[@]}" | sort) <(printf'%s\n'"${EXPECTED_VALUES[@]}" | sort))if[[ ! -z "$MISSING_VALUES"]]; thenecho"ERROR: Following expected files are missing: $MISSING_VALUES" 1>&2ERR=1fiEXTRA_VALUES=$(comm -23 <(printf'%s\n'"${ACTUAL_VALUES[@]}" | sort) <(printf'%s\n'"${EXPECTED_VALUES[@]}" | sort))if[[ ! -z "$EXTRA_VALUES"]]; thenecho"ERROR: Following actual files are not in expected list: $EXTRA_VALUES" 1>&2ERR=1fiif[[$ERR -ne 0]]; thenexit$ERRfi
Dealing With Bash Shells That Don’t Support readarray
Some very old versions of Bash, like the one shipped with OS X, don’t include
the Bash readarray builtin. For these systems, I use the following script:
# //tools/bash/readarray:readarray.bash## Provide a simplified implementation of readarray for Bash shells that don't# have the readarray builtin.if ! type -t readarray >/dev/null; then# Very minimal readarray implementation using read. Does NOT work with lines that contain double-quotes due to eval() readarray(){local cmd
localopt=""local t
localv=MAPFILE
while[$# -gt 0]; docase"$1" in
-h|--help)echo"minimal substitute readarray for older bash"; exit; ;;
-r) shift; opt="$opt -r"; ;;
-t) shift; t=1; ;;
-u) shift;
if[ -n "$1"]; thenopt="$opt -u $1";
shiftfi ;;
*)if[["$1"=~ ^[A-Za-z_]+$ ]]; thenv="$1"shiftelseecho -en "${C_BOLD}${C_RED}Error: ${C_RESET}Unknown option: '$1'\n" 1>&2exitfi ;;
esacdonecmd="read $opt"eval"$v=()"whileIFS=eval"$cmd line"; doline=$(echo"$line" | sed -e "s#\([\"\`]\)#\\\\\1#g")eval"${v}+=(\"$line\")"done}fi
I then wrap this script in a sh_library() which can be used as a dep from my sh_test()s:
#!/bin/bash
## my_test.sh: Implement the test caseset -euo pipefail
# Pull in readarray script to handle Bash shells that don't have the readarray builtinsource ./tools/bash/readarray/readarray.bash
# Populate expected valuesEXPECTED_VALUES=( file1.txt
file2.txt
dir1/file3.txt
)# Populate actual valuesreadarray -t ACTUAL_VALUES < <(cd$DIR&& find . -type f | sed -e 's#^\./##')# Compare expected to actual, existing with non-zero code if they are differentERR=0MISSING_VALUES=$(comm -13 <(printf'%s\n'"${ACTUAL_VALUES[@]}" | sort) <(printf'%s\n'"${EXPECTED_VALUES[@]}" | sort))if[[ ! -z "$MISSING_VALUES"]]; thenecho"ERROR: Following expected files are missing: $MISSING_VALUES" 1>&2ERR=1fiEXTRA_VALUES=$(comm -23 <(printf'%s\n'"${ACTUAL_VALUES[@]}" | sort) <(printf'%s\n'"${EXPECTED_VALUES[@]}" | sort))if[[ ! -z "$EXTRA_VALUES"]]; thenecho"ERROR: Following actual files are not in expected list: $EXTRA_VALUES" 1>&2ERR=1fiif[[$ERR -ne 0]]; thenexit$ERRfi