Variables

Overview

The variables specification in a Dakota input file specifies the parameter set to be iterated by a particular method. In the case of an optimization study, these variables are adjusted in order to locate an optimal design; in the case of parameter studies/sensitivity analysis/design of experiments, these parameters are perturbed to explore the parameter space; and in the case of uncertainty analysis, the variables are associated with distribution/interval characterizations which are used to compute corresponding distribution/interval characterizations for response functions. To accommodate these and other types of studies, Dakota supports design, uncertain, and state variable types for continuous and discrete variable domains. Uncertain types can be further categorized as either aleatory or epistemic, and discrete domains can include discrete range, discrete integer set, discrete string set, and discrete real set.

This chapter surveys key variables concepts, categories, and specific types, and addresses variable-related file formats and the active set vector. See the variables keyword for additional specification details.

Note

In several contexts, Dakota inputs must express variable specifications in what is referred to as “input specification order.” This means the ordering of variables types given in the primary variables table.

Key Dakota variable concepts include:

  • Category (design, uncertain (aleatory/epistemic), state) which groups variables by their primary use.

  • Active View: the subset of variables (categories) being explored in a particular study.

  • Type: a specific named variable

  • Domain: continuous vs. discrete (integer-, string-, or real-valued). Discrete variables span categories and are specified via ranges, admissible sets, and integer-valued discrete probability distributions.

Note

Characterizing the properties of a specific type of variable, e.g., discrete_design_set or lognormal_uncertain often requires providing arrays of data. For example a list of means or set elements_per_variable. The ordering of these arrays must match the ordering of the descriptors for that variable type.

Design Variables

Design variables are adjusted in the course of determining an optimal design or an optimal set of deterministic calibration parameters. These variables may be continuous (real-valued between bounds), discrete range (integer-valued between bounds), discrete set of integers (integer-valued from finite set), discrete set of strings (string-valued from finite set), and discrete set of reals (real-valued from finite set). Continuous design variables are the most common design variable type in engineering applications. All but a handful of the optimization algorithms in Dakota support continuous design variables exclusively.

Continuous Design Variables

The most common type of design variables encountered in engineering applications are of the continuous type. These variables may assume any real value (e.g., 12.34, -1.735e+07) within their bounds. All but a handful of the optimization algorithms in Dakota support continuous design variables exclusively.

Discrete Design Variables

Engineering design problems may contain discrete variables such as material types, feature counts, stock gauge selections, etc. These variables may assume only a fixed number of values, as compared to a continuous variable which has an uncountable number of possible values within its range. Discrete variables may involve a range of consecutive integers (\(x\) can be any integer between 1 and 10), a set of integer values (\(x\) can be 101, 212, or 355), a set of string values (\(x\) can be 'direct', 'gmres', or 'jacobi'), or a set of real values (e.g., \(x\) can be identically 4.2, 6.4, or 8.5).

Discrete variables may be classified as either “categorical” or “noncategorical.” In the latter noncategorical case, the discrete requirement can be relaxed during the solution process since the model can still compute meaningful response functions for values outside the allowed discrete range or set. For example, a discrete variable representing the thickness of a structure is generally a noncategorical variable since it can assume a continuous range of values during the algorithm iterations, even if it is desired to have a stock gauge thickness in the end. In the former categorical case, the discrete requirement cannot be relaxed since the model cannot obtain a solution for values outside the range or set. For example, feature counts are generally categorical discrete variables, since most computational models will not support a non-integer value for the number of instances of some feature (e.g., number of support brackets). An optional categorical specification indicates which discrete real and discrete integer variables are restricted vs. relaxable. String variables cannot be relaxed.

Gradient-based optimization methods cannot be directly applied to problems with discrete variables since derivatives only exist for a variable continuum. For problems with noncategorical variables, the experimental branch and bound capability (branch_and_bound) can be used to relax the discrete requirements and apply gradient-based methods to a series of generated subproblems. For problems with categorical variables, nongradient-based methods (e.g., coliny_ea) are commonly used; however, most of those methods do not take advantage of any structure that may be associated with the categorical variables. The exception is mesh_adaptive_search. If it is possible to define a subjective relationship between the different values a given categorical variable can take on, that relationship can be expressed via a variables adjacency_matrix option. The method will take that relationship into consideration, together with any expressed neighbor_order. Branch and bound techniques are expanded on in Mixed Integer Nonlinear Programming (MINLP) and nongradient-based methods are further described in Optimization.

In addition to engineering applications, many non-engineering applications in the fields of scheduling, logistics, and resource allocation contain discrete design parameters. Within the Department of Energy, solution techniques for these problems impact programs in stockpile evaluation and management, production planning, nonproliferation, transportation (routing, packing, logistics), infrastructure analysis and design, energy production, environmental remediation, and tools for massively parallel computing such as domain decomposition and meshing.

Discrete Design Variable Types:

  • The discrete_design_range type supports a range of consecutive integers between specified lower_bounds and upper_bounds.

  • The discrete_design_set type admits a set of enumerated integer, string, or real values through an elements specification. The set of values must be specified as an ordered, unique set and is stored internally the same way, with a corresponding set of indices that run from 0 to one less than the number of set values. These indices are used by some iterative algorithms (e.g., parameter studies, SCOLIB methods) for simplicity in discrete value enumeration when the actual corresponding set values are immaterial. In the case of parameter studies, this index representation is required in certain step and partition controls.

    Each string element value must be quoted in the Dakota input file and may contain alphanumeric, dash, underscore, and colon. White space, quote characters, and backslash/meta-characters are not permitted.

Uncertain Variables

Deterministic variables (i.e., those with a single known value) do not capture the behavior of the input variables in all situations. In many cases, the exact value of a model parameter is not precisely known. An example of such an input variable is the thickness of a heat treatment coating on a structural steel I-beam used in building construction. Due to variability and tolerances in the coating process, the thickness of the layer is known to follow a normal distribution with a certain mean and standard deviation as determined from experimental data. The inclusion of the uncertainty in the coating thickness is essential to accurately represent the resulting uncertainty in the response of the building.

Uncertain variables directly support the use of probabilistic uncertainty quantification methods such as sampling, reliability, and stochastic expansion methods. They also admit lower and upper distribution bounds (whether explicitly defined, implicitly defined, or inferred), which permits allows their use in methods that rely on a bounded region to define a set of function evaluations (i.e., design of experiments and some parameter study methods).

Aleatory Uncertain Variables

Aleatory uncertainty is also known as inherent variability, irreducible uncertainty, or randomness. It is typically modeled using probability distributions, and probabilistic methods are commonly used for propagating input aleatory uncertainties described by probability distribution specifications. The two following sections describe the continuous and discrete aleatory uncertain variables supported by Dakota.

Continuous Aleatory Uncertain Variables

  • Normal: a probability distribution characterized by a mean and standard deviation. Also referred to as Gaussian. Bounded normal is also supported by some methods with an additional specification of lower and upper bounds.

  • Lognormal: a probability distribution characterized by a mean and either a standard deviation or an error factor. The natural logarithm of a lognormal variable has a normal distribution. Bounded lognormal is also supported by some methods with an additional specification of lower and upper bounds.

  • Uniform: a probability distribution characterized by a lower bound and an upper bound. Probability is constant between the bounds.

  • Loguniform: a probability distribution characterized by a lower bound and an upper bound. The natural logarithm of a loguniform variable has a uniform distribution.

  • Triangular: a probability distribution characterized by a mode, a lower bound, and an upper bound.

  • Exponential: a probability distribution characterized by a beta parameter.

  • Beta: a flexible probability distribution characterized by a lower bound and an upper bound and alpha and beta parameters. The uniform distribution is a special case.

  • Gamma: a flexible probability distribution characterized by alpha and beta parameters. The exponential distribution is a special case.

  • Gumbel: the Type I Largest Extreme Value probability distribution. Characterized by alpha and beta parameters.

  • Frechet: the Type II Largest Extreme Value probability distribution. Characterized by alpha and beta parameters.

  • Weibull: the Type III Smallest Extreme Value probability distribution. Characterized by alpha and beta parameters.

  • Histogram Bin: an empirically-based probability distribution characterized by a set of \((x,y)\) pairs that map out histogram bins (a continuous interval with associated bin count).

Discrete Aleatory Uncertain Variables

The following types of discrete aleatory uncertain variables are available:

  • Poisson: integer-valued distribution used to predict the number of discrete events that happen in a given time interval.

  • Binomial: integer-valued distribution used to predict the number of failures in a number of independent tests or trials.

  • Negative Binomial: integer-valued distribution used to predict the number of times to perform a test to have a target number of successes.

  • Geometric: integer-valued distribution used to model the number of successful trials that might occur before a failure is observed.

  • Hypergeometric: integer-valued distribution used to model the number of failures observed in a set of tests that has a known proportion of failures.

  • Histogram Point (integer, string, real): an empirically-based probability distribution characterized by a set of integer-valued \((i,c)\), string-valued \((s,c)\), and/or real-valued \({r,c}\) pairs that map out histogram points (each a discrete point value \(i\), \(s\), or \(r\), with associated count \(c\)).

For aleatory random variables, Dakota admits an uncertain_correlation_matrix that specifies correlations among the input variables. The correlation matrix defaults to the identity matrix, i.e., no correlation among the uncertain variables.

For additional information on random variable probability distributions, refer to [HM00] and [SW04]. Refer to variables for more detail on the uncertain variable specifications and to Uncertainty Quantification for available methods to quantify the uncertainty in the response.

Epistemic Uncertain Variables

Epistemic uncertainty is reducible uncertainty due to lack of knowledge. Characterization of epistemic uncertainties is often based on subjective prior knowledge rather than objective data.

In Dakota, epistemic uncertainty can be characterized by interval- or set-valued variables (see relevant keywords below) that are propagated to calculate bounding intervals on simulation output using interval analysis methods. These epistemic variable types can optionally include belief structures or basic probability assignments for use in Dempster-Shafer theory of evidence methods. Epistemic uncertainty can alternately be modeled with probability density functions, although results from UQ studies are then typically interpreted as possibilities or bounds, as opposed to a probability distribution of responses.

Dakota supports the following epistemic uncertain variable types:

  • Continuous Interval: a real-valued interval-based specification characterized by sets of lower and upper bounds and Basic Probability Assignments (BPAs) associated with each interval. The intervals may be overlapping, contiguous, or disjoint, and a single interval (with probability = 1) per variable is an important special case. The interval distribution is not a probability distribution, as the exact structure of the probabilities within each interval is not known. It is commonly used with epistemic uncertainty methods.

  • Discrete Interval: an integer-valued variant of the Continuous Interval variable.

  • Discrete Set (integer, string, and real): Similar to discrete design set variables, these epistemic variables admit a finite number of values (elements) for type integer, string, or real, each with an associated probability.

In the discrete case, interval variables may be used to specify categorical choices which are epistemic. For example, if there are three possible forms for a physics model (model 1, 2, or 3) and there is epistemic uncertainty about which one is correct, a discrete uncertain interval or a discrete set could represent this type of uncertainty.

Through nested, Dakota can perform combined aleatory / epistemic analyses such as second-order probability or probability of frequency. For example, a variable can be assumed to have a lognormal distribution with specified variance, with its mean expressed as an epistemic uncertainty lying in an expert-specified interval. See examples in Advanced Model Recursions.

State Variables

State variables consist of auxiliary variables to be mapped through the simulation interface, but are not to be designed nor modeled as uncertain. State variables provide a means to parameterize additional model inputs which, in the case of a numerical simulator, might include solver convergence tolerances, time step controls, or mesh fidelity parameters.

Note

The term “state variable” is overloaded in math, science, and engineering. For Dakota it typically means a fixed parameter and does not refer to, e.g., the solution variables of a differential equation.

State variable configuration mirrors that of design variables. They can be specified via continuous_state (real-valued between bounds), discrete_state_range (integer-valued between bounds), or discrete_state_set (a discrete integer-, string-, or real-valued set). Model parameterizations with strings (e.g., “mesh1.exo”), are also possible using an interface analysis_components specification (see also Parameters file format (standard))

State variables, as with other types of variables, are viewed differently depending on the method in use. By default, only parameter studies, design of experiments, and verification methods will vary state variables. This can be overridden as discussed in Active Variables View.

Since these variables are neither design nor uncertain variables, algorithms for optimization, least squares, and uncertainty quantification do not iterate on these variables by default. They are inactive and hidden from the algorithm. However, Dakota still maps these variables through the user’s interface where they affect the computational model in use. This allows optimization, least squares, and uncertainty quantification studies to be executed under different simulation conditions (which will result, in general, in different results). Parameter studies and design of experiments methods, on the other hand, are general-purpose iterative techniques which do not by default draw a distinction between variable types. They include state variables in the set of variables to be studied, which permit them to explore the effect of state variable values on the responses of interest.

When a state variable is held fixed, the specified initial_state is used as its sole value. If the state variable is defined only by its bounds, then the initial_state will be inferred from the variable bounds or valid set values. If a method iterates on a state variable, the variable is treated as a design variable with the given bounds, or as a uniform uncertain variable with the given bounds.

In some cases, state variables are used direct coordination with an optimization, least squares, or uncertainty quantification algorithm. For example, state variables could be used to enact model adaptivity through the use of a coarse mesh or loose solver tolerances in the initial stages of an optimization with continuous model refinement as the algorithm nears the optimal solution. They also are used to control model fidelity in some UQ approaches.

Management of Mixed Variables by Method

Active Variables View

As alluded to in the previous section, the iterative method selected for use in Dakota partially determines what subset, or view, of the variables data is active in the study. In general, a mixture of various different types of variables is supported within all methods, though by default certain methods will only modify certain types of variables. For example, by default, optimizers and least squares methods only modify design variables, and uncertainty quantification methods typically only utilize uncertain variables. This implies that variables which are not directly controlled by a particular method will be mapped through the interface unmodified. This allows for parameterizations within the model beyond those used by a the method, which can provide the convenience of consolidating the control over various modeling parameters in a single file (the Dakota input file). An important related point is that the active variable set dictates over which continuous variables derivatives are typically computed (see Active Variables for Derivatives).

Default Variables View: The default active variables view is determined from a combination of the response function type and method. If objective_functions or calibration_terms is given in the response specification block, the design variables will be active.

General response_functions do not have a specific interpretation the way objective functions or calibration terms do. For these, the active view is inferred from the method.

  • For parameter studies, or any of the dace, psuade, or fsu methods, the active view is set to all variables.

  • For sampling uncertainty quantification methods, the view is set to aleatory if only aleatory variables are present, epistemic if only epistemic variables are present, or uncertain (covering both aleatory and epistemic) if both are present.

  • For interval estimation or evidence calculations, the view is set to epistemic.

  • For other uncertainty quantification, e.g., reliability methods or stochastic expansion methods, the view is set to aleatory.

  • Finally, for verification studies using richardson_extrap studies, the active view is set to state.

Note

For surrogate-based optimization, where the surrogate is built over points generated by a dace_method_pointer, the point generation is only over the design variables unless otherwise specified, i.e., state variables will not be sampled for surrogate construction.

Explicit View Control: The subset of active variables for a Dakota method can be explicitly controlled by specifying the variables keyword active, together with one of all, design, uncertain, aleatory, epistemic, or state. This causes the Dakota method to operate on the specified variable types, and overriding the defaults. For example, the default behavior for a nondeterministic sampling method is to sample the uncertain variables. However, if the user specified active all in the variables block, the sampling would be performed over all variables (e.g. design and state variables in addition to uncertain variables). This may be desired in situations such as surrogate based optimization under uncertainty, where a surrogate may be built over both design and uncertain variables. Another situation where one may want the fine-grained control available by specifying one of these variable types is when one has state variables but only wants to sample over the design variables when constructing a surrogate model. Finally, more sophisticated uncertainty studies may involve various combinations of epistemic vs. aleatory variables being active in nested models.

Variable Domain

The variable domain setting controls how discrete variables (whether design, uncertain, or state) are treated. If mixed is specified, the continuous and non-categorical discrete variables are treated separately. When relaxed, the discrete variables are relaxed and treated as continuous variables.

Domain control can be useful in optimization problems involving both continuous and discrete variables in order to apply a continuous optimizer to a mixed variable problem. All methods default to a mixed domain except for the experimental branch-and-bound method, which defaults to relaxed.

Usage Notes

Specifying set variables: Sets of integers, reals, and strings have similar specifications, though different value types. The variables are specified using three keywords:

  • Variable declaration keyword, e.g., discrete_design_set: specifies the number of variables being defined.

  • elements_per_variable: a list of positive integers specifying how many set members each variable admits

    • Length: # of variables

    • Default: equal apportionment of elements among variables

  • elements: a list of the permissible integer values in ALL sets, concatenated together.

    • Length: sum of elements_per_variable, or an integer multiple of number of variables

    • The order is very important here.

    • The list is partitioned according to the values of elements_per_variable, and each partition is assigned to a variable.

  • The ordering of elements_per_variable, and the partitions of elements must match the strings from descriptors

Dakota Parameters File Data Formats

Simulation interfaces employ forks or system calls to run simulation workflows via user-developed drivers. Dakota communicates variable or parameter values to the driver and receives back response values using the file system, through the writing of parameters files and reading of results files.

Prior to invoking an analysis driver (or optional input or output filter), Dakota creates a parameters file that contains the current parameter values and a set of function requests. See the interfacing portion of the manual for full details.

Dakota supports three parameters file formats: standard, APREPRO, and JSON. They are explained in full detail in the following sections. Briefly, the standard format is unique to Dakota and uses a simple value tag syntax to communicate information about the evaluation. In the APREPRO format, which is intended for use with the APREPRO template processing utility [Sja92], information is contained in a series of of statements of the form { tag = value }. JSON is JSON (JavaScript Object Notation), a common format for data interchange.

Parameters file format (standard)

The standard parameters file format for a single evaluation is shown in Listing 14.

Listing 14 Parameters file data format - standard option
<int>    variables
<double> <label_cdv_i>         (i = 1 to n_cdv)
<int>    <label_ddiv_i>        (i = 1 to n_ddiv)
<string> <label_ddsv_i>        (i = 1 to n_ddsv)
<double> <label_ddrv_i>        (i = 1 to n_ddrv)
<double> <label_cauv_i>        (i = 1 to n_cauv)
<int>    <label_dauiv_i>       (i = 1 to n_dauiv)
<string> <label_dausv_i>       (i = 1 to n_dausv)
<double> <label_daurv_i>       (i = 1 to n_daurv)
<double> <label_ceuv_i>        (i = 1 to n_ceuv)
<int>    <label_deuiv_i>       (i = 1 to n_deuiv)
<string> <label_deusv_i>       (i = 1 to n_deusv)
<double> <label_deurv_i>       (i = 1 to n_deurv)
<double> <label_csv_i>         (i = 1 to n_csv)
<int>    <label_dsiv_i>        (i = 1 to n_dsiv)
<string> <label_dssv_i>        (i = 1 to n_dssv)
<double> <label_dsrv_i>        (i = 1 to n_dsrv)
<int>    functions
<int>    ASV_i:label_response_i       (i = 1 to m)
<int>    derivative_variables
<int>    DVV_i:label_cdv_i            (i = 1 to p)
<int>    analysis_components
<string> AC_i:analysis_driver_name_i  (i = 1 to q)
<string> eval_id
<int>    metadata
<string> MD_i                         (i = 1 to r)

Integer values are denoted by <int>, <double> denotes a double precision value, and <string> denotes a string value. Each of the major blocks denotes an array which begins with an array length and a descriptive tag. These array lengths can be useful for dynamic memory allocation within a simulator or filter program.

When using Dakota’s batch interface with the standard format, information for multiple evaluations is written in a concatenated fashion to a single batch parameters file. The format for each evaluation is as shown in Listing 14.

The first array for variables begins with the total number of variables (n) with its identifier string variables. The next n lines specify the current values and descriptors of all of the variables within the parameter set in input specification order: continuous design, discrete integer design (integer range, integer set), discrete string design (string set), discrete real design (real set), continuous aleatory uncertain (normal, lognormal, uniform, loguniform, triangular, exponential, beta, gamma, gumbel, frechet, weibull, histogram bin), discrete integer aleatory uncertain (poisson, binomial, negative binomial, geometric, hypergeometric, histogram point integer), discrete string aleatory uncertain (histogram point string), discrete real aleatory uncertain (histogram point real), continuous epistemic uncertain (real interval), discrete integer epistemic uncertain (interval, then set), discrete string epistemic uncertain (set), discrete real epistemic uncertain (set), continuous state, discrete integer state (integer range, integer set), discrete string state, and discrete real state (real set) variables.

Note

The authoritative variable ordering (as noted above in Overview) is given by the primary table in variables.

The lengths of these vectors add to a total of \(n\), i.e.,

\[n = n_{cdv} + n_{ddiv} + n_{ddsv} + n_{ddrv} + n_{cauv} + n_{dauiv} + n_{dausv} + n_{daurv} + n_{ceuv} + n_{deuiv} + n_{deusv} + n_{deurv} + n_{csv} + n_{dsiv} + n_{dssv} + n_{dsrv}.\]

If any of the variable types are not present in the problem, then its block is omitted entirely from the parameters file. The labels come from the variable descriptors specified in the Dakota input file, or default descriptors based on variable type if not specified.

The second array for the active set vector (ASV) begins with the total number of functions (m) and its identifier string functions. The next m lines specify the request vector for each of the m functions in the response data set followed by the tags ASV_i:label_response, where the label is either a user-provided response descriptor or a default-generated one. These integer codes indicate what data is required on the current function evaluation and are described further in The Active Set Vector.

The third array for the derivative variables vector (DVV) begins with the number of derivative variables (p) and its identifier string derivative_variables. The next p lines specify integer variable identifiers followed by the tags DVV_i:label_cdv. These integer identifiers are used to identify the subset of variables that are active for the calculation of derivatives (gradient vectors and Hessian matrices), and correspond to the list of variables in the first array (e.g., an identifier of 2 indicates that the second variable in the list is active for derivatives). The labels are again taken from user-provided or default variable descriptors.

The fourth array for the analysis components (AC) begins with the number of analysis components (q) and its identifier string analysis_components. The next q lines provide additional strings for use in specializing a simulation interface followed by the tags AC_i:analysis_driver_name, where analysis_driver_name indicates the driver associated with this component. These strings are specified in the input file for a set of analysis_drivers using the analysis_components specification. The subset of the analysis components used for a particular analysis driver is the set passed in a particular parameters file.

The next entry eval_id in the parameters file is the evaluation ID, by default an integer indicating interface evaluation ID number. When hierarchical tagging is enabled as described in File Tagging for Evaluations, the identifier will be a colon-separated string, e.g., 4:9:2.

The final array for the metadata (MD) begins with the number of metadata fields requested (r) and its identifier string metadata. The next r lines provide the names of each metadata field followed by the tags MD_i.

Note

Several standard-format parameters file examples are shown in Parameter to Response Mapping Examples.

Parameters file format (APREPRO)

For the APREPRO format option, the same data is present in the same order as the standard format. The only difference is that values are associated with their tags using { tag = value } markup as shown in Listing 15. An APREPRO-format parameters file example is shown in Parameter to Response Mapping Examples. The APREPRO format allows direct usage of Dakota parameters files by the APREPRO utility and Dakota’s DPrePro, which are file pre-processors that can significantly simplify model parameterization.

Note

APREPRO [Sja92] is a Sandia-developed pre-processor that is not distributed with Dakota.

DPrePro is a Python script distributed with Dakota that performs many of the same functions as APREPRO, as well as general template processing, and is optimized for use with Dakota parameters files in either format.

BPREPRO and JPrePost are Perl and Java tools, respectively, in use at other sites.

When a parameters file in APREPRO format is included within a template file (using an include directive), APREPRO recognizes these constructs as variable definitions which can then be used to populate targets throughout the template file. DPrePro, conversely, does not require the use of includes since it processes the Dakota parameters file and template simulation file separately to create a simulation input file populated with the variables data.

Listing 15 Parameters file data format - APREPRO option
{ DAKOTA_VARS = <int> }
{ <label_cdv_i> = <double> }         (i = 1 to n_cdv)
{ <label_ddiv_i> = <int> }           (i = 1 to n_ddiv)
{ <label_ddsv_i> = <string> }        (i = 1 to n_ddsv)
{ <label_ddrv_i> = <double> }        (i = 1 to n_ddrv)
{ <label_cauv_i> = <double> }        (i = 1 to n_cauv)
{ <label_dauiv_i> = <int> }          (i = 1 to n_dauiv)
{ <label_dausv_i> = <string> }       (i = 1 to n_dausv)
{ <label_daurv_i> = <double> }       (i = 1 to n_daurv)
{ <label_ceuv_i> = <double> }        (i = 1 to n_ceuv)
{ <label_deuiv_i> = <int> }          (i = 1 to n_deuiv)
{ <label_deusv_i> = <string> }       (i = 1 to n_deusv)
{ <label_deurv_i> = <double> }       (i = 1 to n_deurv)
{ <label_csv_i> = <double> }         (i = 1 to n_csv)
{ <label_dsiv_i> = <int> }           (i = 1 to n_dsiv)
{ <label_dssv_i> = <string> }        (i = 1 to n_dssv)
{ <label_dsrv_i> = <double> }        (i = 1 to n_dsrv)
{ DAKOTA_FNS = <int> }
{ ASV_i:label_response_i = <int> }              (i = 1 to m)
{ DAKOTA_DER_VARS = <int> }
{ DVV_i:label_cdv_i = <int> }                   (i = 1 to p)
{ DAKOTA_AN_COMPS = <int> }
{ AC_i:analysis_driver_name_i = <string> }      (i = 1 to q)
{ DAKOTA_EVAL_ID = <string> }
{ DAKOTA_METADATA = <int> }
{ MD_i = <string> }                            (i = 1 to r)

As with the standard format, batch parameters files are simply a concatenation of the information for evaluations in the batch in ARREPRO format.

Parameters file format (JSON)

The JSON format encodes information using two structures, objects and arrays. An object is a collection of name/value pairs. In many programming languages it may be known as a dictionary, associative array, map, or hash table. An array is an ordered list of values, and is commonly known as an array, vector, or list. Objects and arrays may contain other objects or arrays, or scalar values that have “primitive” types such as strings, numbers, or booleans.

In Dakota’s JSON format, information about each evaluation is stored in a top-level object. The object contains the names (also known as keys):

Listing 16 Top-level organization of an evaluation in JSON
{
  "variables": [],
  "responses": [],
  "derivative_variables": []
  "analysis_components": [],
  "eval_id": "",
  "metadata": []
}

Unlike in the standard and APREPRO formats, the numbers of variables, responses, derviatve variables, etc, are not explicitly included in the JSON parameters file. They are unnecessary for parsing the file and are simply the lengths of the array in question. Another difference between the JSON format and the standard and APREPRO formats arises when using Dakota’s batch interface. The top-level data structure of a JSON format batch parameters file is an array, which contains evaluation objects.

Variables

Variable labels and values are stored within objects that are elements of the variables array. Each object resembles the following, where the variable value is an integer, double, or string, as appropriate.

Listing 17 Array element of variables object
{
  "label": "<label_var_i>",
  "value": <variable value>
}

The order of the variables in the array is the same as for the standard and APREPRO format files, described in the previous two sections.

Responses

The responses name is associated with an array of objects that store the label and active set for each expected response.

Listing 18 Array element of responses object
{
  "label": "<label_response_i>",
  "active_set": <int>
}

Derivative Variables

Gradients and Hessians, if requested, are expected to be computed with respect to the derivative_variables. The array associated with this key contains 1-based indices into the variables array.

Analysis Components

The analysis_components name is associated with an array of analysis components objects of the form:

Listing 19 Array element of analysis_components object
{
  "driver": "<driver_string>",
  "component": "<an_comp_i>"
}

Evaluation ID and Metadata

Finally, the evaluation ID is a string associated with the eval_id key, and the metadata name refers to an array of strings, a list of the expected metadata responses.

The Active Set Vector

The active set vector (ASV) specifies the function value or derivative response data needed for a particular interface evaluation. Dakota’s ASV gets its name from managing the active set, i.e., the set of functions that are required by a method on a particular function evaluation. However, it also indicates the derivative data needed for active functions, so has an extended meaning beyond that typically used in the optimization literature.

Note

By default a simulation interface is expected to parse the ASV and only return the requested functions, gradients, and Hessians. To alleviate this requirement, see deactivating below.

The active set vector is comprised of vector of integer codes 0–7, one per response function. The integer values 0 through 7 denote a 3-bit binary representation of all possible combinations of value (1), gradient (2), and Hessian (4) requests for a particular function, with the most significant bit denoting the Hessian, the middle bit denoting the gradient, and the least significant bit denoting the value. The specific translations are shown in Table 2.

Table 2 Active set vector integer codes.

Integer Code

Binary representation

Meaning

7

111

Get Hessian, gradient, and value

6

110

Get Hessian and gradient

5

101

Get Hessian and value

4

100

Get Hessian

3

011

Get gradient and value

2

010

Get gradient

1

001

Get value

0

000

No data required, function is inactive

Disabling the ASV: Active set vector control may be turned off to obviate the need for the interface script to check and respond to its contents. When deactivate active_set_vector is specified, the interface is expected to return all function, gradient, and Hessian information enabled in the responses block on every function evaluation.

This option affords a simpler interface implemention, but of course in trade for efficiency. Disabling is most appropriate for cases in which only a relatively small penalty occurs when computing and returning more data than needed on a particular function evaluation.