Testing Dakota Code
Unit Tests
Unit tests are intended for testing specific units, classes and functions when they can be readily constructed (and/or provided mocks) as needed. Unit testing also serves as a mechanism for Test Driven Development (TDD) which represents a best practice for implementing new capability. Just a few of the benefits of TDD include the following:
Enforces (and measures) modularity of code and functionality
Encourages incrementally correct development
Ensures correct behavior
Documents API and software contract
Promotes code coverage and other Software Quality Assurance (SQA) metrics
Historically, Dakota has used both Boost.Test and Teuchos Unit Test features but has recently officially adopted the former. For a minimal example for unit testing see:
src/unit/min_unit_test.cpp
(Boost.Test unit_test.hpp)src/unit/leja_sampling.cpp
(Boost.Test minimal.hpp)
Some more recent / modern examples include:
src/util/unit/MathToolsTest.cpp
src/util/unit/LinearSolverTest.cpp
src/surrogates/unit/PolynomialRegressionTest.cpp
To add a test with a TDD mindset:
Call out the new unit test source file, e.g.
my_test.cpp
, inCMakeLists.txt
,e.g.src/surrogate/unit/CMakeLists.txt
. See helper functions: dakota_add_unit_test (adds test, links libraries, registers with ctest), dakota_copy_test_file (copies with dependency). The build will fail because the file does not yet exist.Add a new file
my_test.cpp
with a failing test macro, e.g.BOOST_CHECK(false)
to verify it builds but the test fails. Name files and associated data directories in a helpful and consistent manner.Use Boost utilities/macros to assess individual test conditions PASS / FAIL as needed.
Compile and run with
ctest –L UnitTest
orctest –R my_test
.Iteratively add and refine tests and modify Dakota core source code as the capability evolves.
To run all unit tests:
cd dakota/build/
# Run all unit tests:
ctest -L (--label-regex) UnitTest
# With detailed output at top-level:
ctest -L (--label-regex) UnitTest -VV (--extra-verbose)
# To run a single test, via regular expression:
ctest -L UnitTest -R surrogate_unit_tests
A failing CTest unit test can be diagnosed using the following as a starting point:
cd build/src/unit (or other directory containing the test executable)
# First, manually run the failing test to see what information is provided related to the failure(s):
./surrogate_unit_tests
# To see available Boost Test options:
./surrogate_unit_tests --help
# To get detailed debugging info from a unit test:
./surrogate_unit_tests --log_level all
Note
A google search can also provide current best practices with Boost.Test and specifics related to the details of the test failure(s)
Regression Tests
Regression tests compare the output of complete Dakota studies against baseline behavior to ensure that changes to the code do not cause unexpected changes to output. Ideally they are fast running and use models with known behavior such as polynomials or other canonical problems.
The following are a few key concepts in Dakota’s regression test system:
In the source tree, most important test-related content is located in the test/ directory. Test files are named dakota_*.in. Each test file has a baseline file named dakota_*.base. Some tests have other associated data files and drivers.
Configuring Dakota causes test files and associated content to be copied to subfolders within the test/ folder of the build tree. This is where they will be run.
A single test file can contain multiple numbered serial and parallel subtests. Each subtest, after extraction from the test file, is a valid Dakota input file.
Tests usually should be run using the
ctest
commmand. CTest uses the script dakota_test.perl, which is located in the test directory, to do most of the heavy lifting. This script can be run from the command line, as well. Run it with the argument--man
for documentation of its options.Subtests can be categorized and described using CTest labels. (use
ctest --print-labels
in a build tree to view labels of existing tests). One purpose of labels is to state whether an optional component of Dakota is needed to run the test.
Running Regression Tests
Dakota’s full regression test suite contains approxiately 300 test files and more than a thousand subtests. It typically takes between several tens of minutes to a few hours to complete, depending on available computing resouces. The test system executes Dakota for each subtest, collects the output, and compares it to a baseline. There are three possible results for a subtest:
PASS: Dakota output matched the baseline to within a numerical tolerance
DIFF: Dakota ran to completion, but its output did not match the baseline
FAIL: Dakota did not run to completion (it failed to run altogether or returned nonzero)
In a Dakota build tree, the ctest
command is the best way to run Dakota
tests, including regression tests. Running the command with no options runs all the tests
sequentially. A few helpful options:
-j N
: Run N tests concurrently. Be aware that some of Dakota’s regression tests may make use of local or MPI parallelism and may use multiple cores.-L <label>
: Run only those tests whose label matches the regex <label>. To run only regression tests (and not, e.g. unit tests), use the labelRegression
.-R <name>
: Run only those tests whose name matches the regrex <name>. The
It currently is not possible to run specific subtests; all subtests of a test selected by label or name will be run.
During configuration, test files, baselines, and auxilliary content is copied
from the source tree to the build tree, where tests will be run. For each test
file, a subdirectory is created in the test/ directory. The subdirectories have the
names of their test files, minus the .in
extension. If Dakota was built with parallel
support, an additional subfolder is created for any parallel subtests in a test file.
Its name is the test name with the letter p
prepended.
The results of each test are located in their subfolders. For serial subtests, the results
are in the file dakota_diffs.out
. For parallel subtests, the results file is dakota_pdiffs.out
.
These files state whether each subtest PASSed, DIFFed, or FAILed. If the test DIFFed, a diff
of the Dakota console output and baseline is listed.
The make target dakota-diffs
causes all the dakota_diffs.out
files from individual tests
to be concatenated into a one dakota_diffs.out
in the test/ directory, and similarly for the
dakota_pdiffs.out
files.
Subsequent runs of ctest
will cause test results to be appended to existing dakota_diffs.out
files. The make target dakota-diffs-clean
freshens the test/ folder.
Note
While Dakota’s test system has three possible test results (PASS, DIFF, FAIL), CTest has only two (PASS or FAIL) and reports Dakota DIFFs as failures. Quite often tests that CTest reports as failing are exhibiting only minor numerical differences from baseline and are no cause for concern. Check dakota_diffs.out/dakota_pdiffs.out for “failing” tests before concluding that there’s a problem with your Dakota build.
The ctest
command uses the script dakota_test.perl
and its helper dakota_diff.perl
to extract subtests, run Dakota to produce test output, and diff the results against
the baseline. It is possible to run dakota_test.perl
from the command line. Use the argument
--man
to see its options. (The -e
option to extract a subtest is particularly useful.)
If a regression test fails, steps to diagnose the failure include the following which are performed in the Dakota build directory:
Remove previous test artifacts related to detailed differences and failures via
make dakota-diffs-clean
.By default,
dakota_test.perl
overwrites Dakota output after each subtest. Set theDAKOTA_TEST_SAVE_OUTPUT
environment variable to 1 to save it.Rerun the failing CTest:
ctest -R test_name
. (This regex will catch both the serial and parallel subtests. Add a carat (^
) at the beginning of the pattern to exclude the parallel subtests.)Generate details for how the test differs from the corresponding baseline:
make dakota-diffs
.Go into the specific regression test directory and examine the
dakota_diffs.out
file to see which subtest(s) failed.Compare the
.tst
file contents with the.base
file contents to determine which values have changed, if there was a catastrophic failure of the executable, etc.
Creating a New Regression Test
A complete regression test includes a test file, which can contain multiple subtests, a baseline, and any auxilliary files (such as data files or drivers) needed by the subtests.
Writing Subtests
Including multiple subtests within a single test file reduces maintenance burden by allowing related test cases to share Dakota specifications that they have in common.
A test file is just a Dakota input file that has been annotated to indicate the lines that belong to each subtest. Subtests are numbered, beginning with 0, and serial and parallel subtests have independent numbering. The rules for annotating the lines of a test file are:
Lines required for all test cases should be left uncommented.
A line that should only be activated for specific subtests should be commented out, and the label
#sN
or#pN
for serial and parallel tests, respectively, should be added to the end.N
is the integer associated with a subtest. Whendakota_test.perl
extracts a subtest from the test file, it will keep all uncommented lines and also lines that end with a the corresponding subtest tag.The tag
#s0
has a special meaning. Serial subtest 0 is considered to include all uncommented lines in the original test input file. Lines with the#s0
tag should not be commented out. They will not be extracted for other subtests. This convention allows the 0th serial subtest to be runnable without extraction.If a line in the input file is used in multiple subtests (but not all), the tags should appear in a comma-separated list. For example, if a line in the input file belongs to serial subtests 1 and 2, the line should end in
#s1,#s2
.
Test Directives
The test creator can (and in some cases must) provide additional information to Dakota’s test system through the use of directives. Directives must appear at the top of the file, but can be in any order. They have the format:
#@ <subtest specifier> = <directive>
The subtest specifier indicates which of the subtests the directive applies to. Its format is similar to the subtest tag:
sN
orpN
, without a pound sign.N
can be a subtest number or, if the directive applies to all serial or parallel subtests in teh file,*
.
Directive |
Meaning |
---|---|
Label=FastTest |
A quick-running test. |
Label=Experimental |
Experimental capability. For information only. |
Label=AcceptanceTest |
Acceptance tests rarely diff on any platform |
DakotaConfig=FLAG |
Only run this test if the CMake variable FLAG is true. |
MPIProcs=N |
Execute Dakota in parallel with N MPI tasks |
TimeoutDelay=N |
Terminate Dakota if console output is unchanged for N seconds (default is 60s) |
TimeoutAbsolute=N |
Terminate Dakota if the subtest takes longer than N seconds (default is 1200 s) |
CheckOutput=’FILENAME’ |
Dakota output for the test will appear in FILENAME instead of the default |
Restart=read |
Read the restart file written by the last subtest. Implies Restart=write. |
Restart=write |
Write a restart file (usually to be used by a future subtest) |
Restart=none |
No restart option |
DependsOn=N |
This subtest depends on another. Currently for information only. |
ExecCmd=’CMD’ |
Instead of ‘dakota’, run this command. |
ExecArgs=’ARGS’ |
Arguments passed to command |
InputFile=’INPUT’ |
Specify an input file instead of the extracted one. |
UserMan=’FILENAME’ |
Extract subtest to FILENAME for use in the User’s Manual |
A few notes on directives:
Labels can be used to filter tests using the
-L
option toctest
or the--label-regex
option todakota_test.perl
.MPIProcs
is required for parallel tests.
Example Test file
An example input test file demonstrating a few of these features is below.
#@ s*: Label=FastTest
#@ s0: DakotaConfig=HAVE_QUESO
method
bayes_calibration queso #s0
chain_samples = 100 seed = 100 #s0
# sampling #s1,#s2
# sample_type lhs #s1
# sample_type random #s2
# samples = 100 #s1,#s2
# seed = 17 #s1,#s2
variables
uniform_uncertain 2
lower_bounds -2. -2.
upper_bounds 2. 2.
interface
analysis_driver = 'rosenbrock'
direct
responses
objective_functions = 1
no_gradients
no_hessians
This input file has three test cases: the first (s0) is Bayesian
calibration using QUESO, the second (s1) is LHS sampling, and the
third (s2) is random sampling. All the input file lines that are
shared between the test cases are uncommented. Note that the
lines specific to Subtest s0 that should not appear in the
input files for Test Cases 1 and 2 have #s0
appended to
them.
The test has the label FastTest
and will only be run when
Dakota is built with the optional QUESO component.
To create a new baseline dakota_*.base file for serial regression tests, call
dakota_test.perl --base name_of_new_input_file.in
This will create a file with extension .base.new with the same basename as the input file. Check the results, then change the extension to .base to incorporate it into the test suite.
More advanced options for generating baseline
files (e.g., for parallel tests) and more details about creating
baselines are available in dakota_test.perl --man
.
Unit Test-driven System Tests
These hybrid tests can be useful when it’s difficult to mock up all the objects needed for testing, e.g., Dakota Model, Variables, Interface, Responses, and yet finer-grained control over results verification is desired compared with that of regression tests. One way to view these types of unit tests are those that construct most of a complete Dakota study as a mock and which then do fine-grained testing of selected functionality from the instantiated objects. In brief, these tests:
Are registered as unit tests
Operate at the level of constructing a Dakota Environment from an input file and running a whole study to populate needed class data
Test criteria that are more fine-grained and controllable than regression tests
An illustrative example is described next and in
src/unit_test/opt_tpl_rol_test_textbook.cpp
.
The following provides a walkthrough for developers who wish to add a Test-driven System unit test that includes an end-to-end Dakota analysis. The procedure relies on setting up a problem description database using a Dakota input string and subsequently executing the environment. The last step involves extracting the quantities of interest (results) to be tested using unit test macros.
Test environment definition
The developer defines a testing environment by constructing a problem description database from a Dakota input string, e.g.
// Dakota input string for serial case (cyl_head):
static const char dakota_input[] =
" method,"
" output silent"
" max_function_evaluations 300"
" mesh_adaptive_search"
" threshold_delta = 1.e-10"
" variables,"
" continuous_design = 2"
" initial_point 1.51 0.01"
" upper_bounds 2.164 4.0"
" lower_bounds 1.5 0.0"
" descriptors 'intake_dia' 'flatness'"
" interface,"
" direct"
" analysis_driver = 'cyl_head'"
" responses,"
" num_objective_functions = 1"
" nonlinear_inequality_constraints = 3"
" no_gradients"
" no_hessians";
The input string is then used to create a Dakota environment:
// No input file set --> no parsing:
Dakota::ProgramOptions opts;
opts.echo_input(false);
opts.input_string(dakota_input);
// delay validation/sync of the Dakota database and iterator
// construction to allow update after all data is populated
bool check_bcast_construct = false;
// set up a Dakota instance
Dakota::LibraryEnvironment * p_env = new Dakota::LibraryEnvironment(MPI_COMM_WORLD, opts, check_bcast_construct);
Dakota::LibraryEnvironment & env = *p_env;
Dakota::ParallelLibrary& parallel_lib = env.parallel_library();
// configure Dakota to throw a std::runtime_error instead of calling exit
env.exit_mode("throw");
// once done with changes: check database, broadcast, and construct iterators
env.done_modifying_db();
Executing the environment
Once an environment is defined, instantiation of Dakota objects and population of class data is achieved by executing the study:
// Execute the environment
env.execute();
Extracting results and test assertions
Following execution, the pertinent results are extracted and used to test correctness criteria. This is performed using the Boost unit test capabilities, e.g.
// retrieve the final parameter values
const Variables& vars = env.variables_results();
// retrieve the final response values
const Response& resp = env.response_results();
// Convergence test: check that first continuous variable
// has reached optimal value within given tolerance
double target = 2.1224215765;
double max_tol = 1.e-5;
double rel_err = fabs((vars.continuous_variable(0) - target)/target);
BOOST_CHECK(rel_err < max_tol);
// Convergence test: check that second continuous variable
// has reached optimal value within given tolerance
target = 1.7659069377;
max_tol = 1.e-2;
rel_err = fabs((vars.continuous_variable(1) - target)/target);
BOOST_CHECK(rel_err < max_tol);
// Convergence test: check that the final response value
// has reached the corresponding minimum within given tolerance
target = -2.4614299775;
max_tol = 1.e-3;
rel_err = fabs((resp.function_value(0) - target)/target);
BOOST_CHECK(rel_err < max_tol);
Unit test macros
There are several unit test macros to support various comparisons, assertions, exceptions, etc. See https://www.boost.org/doc/libs/1_69_0/libs/test/doc/html/boost_test/utf_reference/testing_tool_ref.html for details and exmaples.