Simulation File Management

This section describes some management features used for files that transfer data between Dakota and simulation codes (i.e., when the system call or fork interfaces are used). These features can generate unique filenames when Dakota executes programs in parallel and can help you debug the interface between Dakota and a simulation code.

File Saving

The file_save option in the interface specification allows the user to control whether parameters and results files are retained or removed from the working directory after the analysis completes. Dakota’s default behavior is to remove files once their use is complete to reduce clutter. If the method output setting is verbose, a file remove notification will follow the function evaluation echo, e.g.,

driver /usr/tmp/aaaa20305 /usr/tmp/baaa20305
Removing /usr/tmp/aaaa20305 and /usr/tmp/baaa20305

However, if file_save appears in the interface specification, these files will not be removed. This latter behavior is often useful for debugging communication between Dakota and simulator programs. An example of a file_save specification is shown in the file tagging example below.

Note

Before driver execution, any previous results file will be removed immediately prior to executing the analysis driver. This behavior addresses a previously common problem resulting from users starting Dakota with stale results files in the run directory. To override this default behavior and preserve any existing results files, you must specify allow_existing_results.

File Tagging for Evaluations

When a user provides parameters_file and results_file specifications, the file_tag option in the interface specification causes Dakota to make the names of these files unique by appending the function evaluation number to the root file names. Default behavior is to not tag these files, which has the advantage of allowing the user to ignore command line argument passing and always read to and write from the same file names. However, it has the disadvantage that files may be overwritten from one function evaluation to the next. When file_tag appears in the interface specification, the file names are made unique by the appended evaluation number. This uniqueness requires the user’s interface to get the names of these files from the command line. The file tagging feature is most often used when concurrent simulations are running in a common disk space, since it can prevent conflicts between the simulations. An example specification of file_tag and file_save is shown below:

interface
    system
        analysis_driver = 'text_book'
        parameters_file = 'text_book.in'
        results_file    = 'text_book.out'
        file_tag
                file_save

Note

When a user specifies names for the parameters and results files and file_save is used without file_tag, untagged files are used in the function evaluation but are then moved to tagged files after the function evaluation is complete, to prevent overwriting files for which a file_save request has been given. If the output control is set to verbose, then a notification similar to the following will follow the function evaluation echo:

driver params.in results.out
Files with non-unique names will be tagged to enable file_save:
Moving params.in to params.in.1
Moving results.out to results.out.1

Hierarchical Tagging

When a model’s specification includes the hierarchical_tagging keyword, the tag applied to parameter and results file names of any subordinate interfaces will reflect any model hierarchy present. This option is useful for studies involving multiple models with a nested or hierarchical relationship. For example a nested model has a sub-method, which itself likely operates on a sub-model, or a hierarchical approximation involves coordination of low and high fidelity models. Specifying hierarchical_tagging will yield function evaluation identifiers (“tags”) composed of the evaluation IDs of the models involved, e.g., outermodel.innermodel.interfaceid = 4.9.2. This communicates the outer contexts to the analysis driver when performing a function evaluation.

For an example of using hierarchical tagging in a nested model context, see dakota/share/dakota/test/dakota_uq_timeseries_*_optinterf.in.

Temporary Files

If parameters_file and results_file are not specified by the user, temporary files having generated names are used. For example, a system call to a single analysis driver might appear as:

driver /tmp/dakota_params_aaaa2035 /tmp/dakota_results_baaa2030

and a system call to an analysis driver with filter programs might appear as:

ifilter /tmp/dakota_params_aaaa2490 /tmp/dakota_results_baaa2490;
     driver /tmp/dakota_params_aaaa2490 tmp/dakota_results_baaa2490;
     ofilter /tmp/dakota_params_aaaa2490 /tmp/dakota_results_baa22490

These files have unique names created by Boost filesystem utilities. This uniqueness requires the user’s interface to get the names of these files from the command line. File tagging with evaluation number is unnecessary with temporary files, but can be helpful for the user workflow to identify the evaluation number. Thus file_tag requests will be honored. A file_save request will be honored, but it should be used with care since the temporary file directory could easily become cluttered without the user noticing.

File Tagging for Analysis Drivers

When multiple analysis drivers are involved in performing a function evaluation with either the system call or fork simulation interface, a secondary file tagging is automatically used to distinguish the results files used for the individual analyses. This applies to both the case of user-specified names for the parameters and results files and the default temporary file case. Examples for the former case were shown previously in the sections on multiple analysis drivers without filters and with filters.

The following examples demonstrate the latter temporary file case. Even though Unix temporary files have unique names for a particular function evaluation, tagging is still needed to manage the individual contributions of the different analysis drivers to the response results, since the same root results filename is used for each component. For the system call interface, the syntax would be similar to the following:

ifilter /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ;
     driver1 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.1;
     driver2 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.2;
     driver3 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.3;
     ofilter /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ

and, for the fork interface, similar to:

blocking fork:
     ifilter /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ;
     driver1 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.1;
     driver2 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.2;
     driver3 /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ.3;
     ofilter /var/tmp/aaawkaOKZ /var/tmp/baaxkaOKZ

Tagging of results files with an analysis identifier is needed since each analysis driver must contribute a user-defined subset of the total response results for the evaluation. If an output filter is not supplied, Dakota will combine these portions through a simple overlaying of the individual contributions (i.e., summing the results in /var/tmp/baaxkaOKZ.1, /var/tmp/baaxkaOKZ.2, and /var/tmp/baaxkaOKZ.3).

If this simple approach is inadequate, then an output filter should be supplied to perform the combination. This is the reason why the results file for the output filter does not use analysis tagging; it is responsible for the results combination (i.e., combining /var/tmp/baaxkaOKZ.1, /var/tmp/baaxkaOKZ.2, and /var/tmp/baaxkaOKZ.3 into /var/tmp/baaxkaOKZ). In this case, Dakota will read only the results file from the output filter (i.e., /var/tmp/baaxkaOKZ) and interpret it as the total response set for the evaluation.

Parameters files are not currently tagged with an analysis identifier. This reflects the fact that Dakota does not attempt to subdivide the requests in the active set vector for different analysis portions. Rather, the total active set vector is passed to each analysis driver and the appropriate subdivision of work must be defined by the user. This allows the division of labor to be very flexible. In some cases, this division might occur across response functions, with different analysis drivers managing the data requests for different response functions. And in other cases, the subdivision might occur within response functions, with different analysis drivers contributing portions to each of the response functions. The only restriction is that each of the analysis drivers must follow the response format dictated by the total active set vector. For response data for which an analysis driver has no contribution, 0’s must be used as placeholders.

Work Directories

Sometimes it is convenient for simulators and filters to run in a directory different from the one where Dakota is invoked. For instance, when performing concurrent evaluations and/or analyses, it is often necessary to cloister input and output files in separate directories to avoid conflicts. A simulator script used as an analysis driver can, of course, include commands to change to a different directory if desired (while still arranging to write a results file in the original directory), but Dakota has facilities that may simplify the creation of simulator scripts.

When the work directory feature is enabled, Dakota will create a directory for each evaluation/analysis (with optional tagging and saving as with files). To enable this feature, an interface specification must include the keyword work_directory, then Dakota will arrange for the simulator and any filters to wake up in the work directory, with $PATH adjusted (if necessary) so programs that could be invoked without a relative path to them (i.e., by a name not involving any slashes) from Dakota’s directory can also be invoked from the simulator’s (and filter’s) directory.

On occasion, it is convenient for the simulator to have various files, e.g., data files, available in the directory where it runs. If, say, my/special/directory/ is such a directory (as seen from Dakota’s directory), the interface specification

work_directory
    named 'my/special/directory'

would cause Dakota to start the simulator and any filters in that directory. If the directory did not already exist, Dakota would create it and would remove it after the simulator (or output filter, if specified) finished, unless instructed not to do so by the appearance of directory_save in the interface specification. If named does not appear, then directory_save cannot appear either, and Dakota creates a temporary directory (using the tmpnam function to determine its name) for use by the simulator and any filters. If you specify directory_tag, Dakota causes each invocation of the simulator and any filters to start in a subdirectory of the work directory with a name composed of the work directory’s name followed by a period and the invocation number (1, 2, $...$); this might be useful in debugging.

Sometimes it can be helpful for the simulator and filters to start in a new directory populated with some files. Adding

link_files 'templatedir/*'

to the work directory specification would cause the contents of directory templatedir/ to be linked into the work directory. Linking makes sense if files are large, but when practical, it is far more reliable to have copies of the files; adding copy_files to the specification would cause the contents of the template directory to be copied to the work directory. The linking or copying does not overwrite existing files unless replace also appears in the specification.

Here is a summary of possibilities for a work directory specification, with [...] denoting that $...$ is optional:

work_directory
[ named '...' ]
[ directory_tag ]
[ directory_save ]
[ link_files '...' '...' ]
[ copy_files '...' '...' ]
[ replace ]

Listing 19 contains an example of these specifications in a Dakota input file for constrained optimization.

Listing 19 The workdir_textbook.in input file.

# Minimal example with common work directory specifications

method
  rol
    max_iterations = 60,
    variable_tolerance = 1e-6
    constraint_tolerance = 1e-6

variables
  continuous_design = 2
    initial_point    0.9    1.1
    upper_bounds     5.8    2.9
    lower_bounds     0.5   -2.9
    descriptors      'x1'   'x2'

interface
  # text_book driver must be in run directory or on PATH
  fork analysis_driver = 'text_book'
    parameters_file = 'params.in'
    results_file    = 'results.out'
  work_directory named 'tb_work'
    directory_tag directory_save file_save

responses
  objective_functions = 1
  nonlinear_inequality_constraints = 2
  analytic_gradients
  no_hessians