Variables
Overview
The variables
specification in a Dakota input file specifies the
parameter set to be iterated by a particular method. In the case of an
optimization study, these variables are adjusted in order to locate an
optimal design; in the case of parameter studies/sensitivity
analysis/design of experiments, these parameters are perturbed to
explore the parameter space; and in the case of uncertainty analysis,
the variables are associated with distribution/interval
characterizations which are used to compute corresponding
distribution/interval characterizations for response functions. To
accommodate these and other types of studies, Dakota supports design,
uncertain, and state variable types for continuous and discrete variable
domains. Uncertain types can be further categorized as either aleatory
or epistemic, and discrete domains can include discrete range, discrete
integer set, discrete string set, and discrete real set.
This chapter surveys key variables concepts, categories, and specific
types, and addresses variable-related file formats and the active set
vector. See the variables
keyword for additional
specification details.
Note
In several contexts, Dakota inputs must express variable
specifications in what is referred to as “input specification
order.” This means the ordering of variables types given in the
primary variables
table.
Key Dakota variable concepts include:
Category (design, uncertain (aleatory/epistemic), state) which groups variables by their primary use.
Active View: the subset of variables (categories) being explored in a particular study.
Type: a specific named variable
Domain: continuous vs. discrete (integer-, string-, or real-valued). Discrete variables span categories and are specified via ranges, admissible sets, and integer-valued discrete probability distributions.
Note
Characterizing the properties of a specific type of variable, e.g.,
discrete_design_set
or
lognormal_uncertain
often requires providing
arrays of data. For example a list of means
or set
elements_per_variable
. The ordering of these arrays must match
the ordering of the descriptors
for that variable type.
Design Variables
Design variables are adjusted in the course of determining an optimal design or an optimal set of deterministic calibration parameters. These variables may be continuous (real-valued between bounds), discrete range (integer-valued between bounds), discrete set of integers (integer-valued from finite set), discrete set of strings (string-valued from finite set), and discrete set of reals (real-valued from finite set). Continuous design variables are the most common design variable type in engineering applications. All but a handful of the optimization algorithms in Dakota support continuous design variables exclusively.
Continuous Design Variables
The most common type of design variables encountered in engineering
applications are of the continuous type. These variables may assume any
real value (e.g., 12.34
, -1.735e+07
) within their bounds. All
but a handful of the optimization algorithms in Dakota support
continuous design variables exclusively.
Discrete Design Variables
Engineering design problems may contain discrete variables such as
material types, feature counts, stock gauge selections, etc. These
variables may assume only a fixed number of values, as compared to a
continuous variable which has an uncountable number of possible values
within its range. Discrete variables may involve a range of consecutive
integers (\(x\) can be any integer between 1
and 10
), a set
of integer values (\(x\) can be 101
, 212
, or 355
), a set
of string values (\(x\) can be 'direct'
, 'gmres'
, or
'jacobi'
), or a set of real values (e.g., \(x\) can be
identically 4.2
, 6.4
, or 8.5
).
Discrete variables may be classified as either “categorical” or
“noncategorical.” In the latter noncategorical case, the discrete
requirement can be relaxed during the solution process since the model
can still compute meaningful response functions for values outside the
allowed discrete range or set. For example, a discrete variable
representing the thickness of a structure is generally a noncategorical
variable since it can assume a continuous range of values during the
algorithm iterations, even if it is desired to have a stock gauge
thickness in the end. In the former categorical case, the discrete
requirement cannot be relaxed since the model cannot obtain a solution
for values outside the range or set. For example, feature counts are
generally categorical discrete variables, since most computational
models will not support a non-integer value for the number of instances
of some feature (e.g., number of support brackets). An optional
categorical
specification indicates which discrete real and
discrete integer variables are restricted vs. relaxable. String
variables cannot be relaxed.
Gradient-based optimization methods cannot be directly applied to
problems with discrete variables since derivatives only exist for a
variable continuum. For problems with noncategorical variables, the
experimental branch and bound capability
(branch_and_bound
) can be
used to relax the discrete requirements and apply gradient-based methods
to a series of generated subproblems. For problems with categorical
variables, nongradient-based methods (e.g., coliny_ea
)
are commonly
used; however, most of those methods do not take advantage of any
structure that may be associated with the categorical variables. The
exception is mesh_adaptive_search
.
If it is possible to define a
subjective relationship between the different values a given categorical
variable can take on, that relationship can be expressed via a
variables
adjacency_matrix
option. The
method will take that relationship into consideration, together with
any expressed
neighbor_order
. Branch and bound
techniques are expanded on in Mixed Integer Nonlinear Programming (MINLP) and
nongradient-based methods are further described in Optimization.
In addition to engineering applications, many non-engineering applications in the fields of scheduling, logistics, and resource allocation contain discrete design parameters. Within the Department of Energy, solution techniques for these problems impact programs in stockpile evaluation and management, production planning, nonproliferation, transportation (routing, packing, logistics), infrastructure analysis and design, energy production, environmental remediation, and tools for massively parallel computing such as domain decomposition and meshing.
Discrete Design Variable Types:
The
discrete_design_range
type supports a range of consecutive integers between specifiedlower_bounds
andupper_bounds
.The
discrete_design_set
type admits a set of enumerated integer, string, or real values through anelements
specification. The set of values must be specified as an ordered, unique set and is stored internally the same way, with a corresponding set of indices that run from 0 to one less than the number of set values. These indices are used by some iterative algorithms (e.g., parameter studies, SCOLIB methods) for simplicity in discrete value enumeration when the actual corresponding set values are immaterial. In the case of parameter studies, this index representation is required in certain step and partition controls.Each string element value must be quoted in the Dakota input file and may contain alphanumeric, dash, underscore, and colon. White space, quote characters, and backslash/meta-characters are not permitted.
Uncertain Variables
Deterministic variables (i.e., those with a single known value) do not capture the behavior of the input variables in all situations. In many cases, the exact value of a model parameter is not precisely known. An example of such an input variable is the thickness of a heat treatment coating on a structural steel I-beam used in building construction. Due to variability and tolerances in the coating process, the thickness of the layer is known to follow a normal distribution with a certain mean and standard deviation as determined from experimental data. The inclusion of the uncertainty in the coating thickness is essential to accurately represent the resulting uncertainty in the response of the building.
Uncertain variables directly support the use of probabilistic uncertainty quantification methods such as sampling, reliability, and stochastic expansion methods. They also admit lower and upper distribution bounds (whether explicitly defined, implicitly defined, or inferred), which permits allows their use in methods that rely on a bounded region to define a set of function evaluations (i.e., design of experiments and some parameter study methods).
Aleatory Uncertain Variables
Aleatory uncertainty is also known as inherent variability, irreducible uncertainty, or randomness. It is typically modeled using probability distributions, and probabilistic methods are commonly used for propagating input aleatory uncertainties described by probability distribution specifications. The two following sections describe the continuous and discrete aleatory uncertain variables supported by Dakota.
Continuous Aleatory Uncertain Variables
Normal: a probability distribution characterized by a mean and standard deviation. Also referred to as Gaussian. Bounded normal is also supported by some methods with an additional specification of lower and upper bounds.
Lognormal: a probability distribution characterized by a mean and either a standard deviation or an error factor. The natural logarithm of a lognormal variable has a normal distribution. Bounded lognormal is also supported by some methods with an additional specification of lower and upper bounds.
Uniform: a probability distribution characterized by a lower bound and an upper bound. Probability is constant between the bounds.
Loguniform: a probability distribution characterized by a lower bound and an upper bound. The natural logarithm of a loguniform variable has a uniform distribution.
Triangular: a probability distribution characterized by a mode, a lower bound, and an upper bound.
Exponential: a probability distribution characterized by a beta parameter.
Beta: a flexible probability distribution characterized by a lower bound and an upper bound and alpha and beta parameters. The uniform distribution is a special case.
Gamma: a flexible probability distribution characterized by alpha and beta parameters. The exponential distribution is a special case.
Gumbel: the Type I Largest Extreme Value probability distribution. Characterized by alpha and beta parameters.
Frechet: the Type II Largest Extreme Value probability distribution. Characterized by alpha and beta parameters.
Weibull: the Type III Smallest Extreme Value probability distribution. Characterized by alpha and beta parameters.
Histogram Bin: an empirically-based probability distribution characterized by a set of \((x,y)\) pairs that map out histogram bins (a continuous interval with associated bin count).
Discrete Aleatory Uncertain Variables
The following types of discrete aleatory uncertain variables are available:
Poisson: integer-valued distribution used to predict the number of discrete events that happen in a given time interval.
Binomial: integer-valued distribution used to predict the number of failures in a number of independent tests or trials.
Negative Binomial: integer-valued distribution used to predict the number of times to perform a test to have a target number of successes.
Geometric: integer-valued distribution used to model the number of successful trials that might occur before a failure is observed.
Hypergeometric: integer-valued distribution used to model the number of failures observed in a set of tests that has a known proportion of failures.
Histogram Point (integer, string, real): an empirically-based probability distribution characterized by a set of integer-valued \((i,c)\), string-valued \((s,c)\), and/or real-valued \({r,c}\) pairs that map out histogram points (each a discrete point value \(i\), \(s\), or \(r\), with associated count \(c\)).
For aleatory random variables, Dakota admits an
uncertain_correlation_matrix
that specifies
correlations among the input variables. The correlation matrix
defaults to the identity matrix, i.e., no correlation among the
uncertain variables.
For additional information on random variable probability
distributions, refer to [HM00] and [SW04]. Refer to
variables
for more detail on the uncertain variable
specifications and to Uncertainty Quantification for available methods to quantify the
uncertainty in the response.
Epistemic Uncertain Variables
Epistemic uncertainty is reducible uncertainty due to lack of knowledge. Characterization of epistemic uncertainties is often based on subjective prior knowledge rather than objective data.
In Dakota, epistemic uncertainty can be characterized by interval- or set-valued variables (see relevant keywords below) that are propagated to calculate bounding intervals on simulation output using interval analysis methods. These epistemic variable types can optionally include belief structures or basic probability assignments for use in Dempster-Shafer theory of evidence methods. Epistemic uncertainty can alternately be modeled with probability density functions, although results from UQ studies are then typically interpreted as possibilities or bounds, as opposed to a probability distribution of responses.
Dakota supports the following epistemic uncertain variable types:
Continuous Interval: a real-valued interval-based specification characterized by sets of lower and upper bounds and Basic Probability Assignments (BPAs) associated with each interval. The intervals may be overlapping, contiguous, or disjoint, and a single interval (with probability = 1) per variable is an important special case. The interval distribution is not a probability distribution, as the exact structure of the probabilities within each interval is not known. It is commonly used with epistemic uncertainty methods.
Discrete Interval: an integer-valued variant of the Continuous Interval variable.
Discrete Set (integer, string, and real): Similar to discrete design set variables, these epistemic variables admit a finite number of values (
elements
) for type integer, string, or real, each with an associated probability.
In the discrete case, interval variables may be used to specify categorical choices which are epistemic. For example, if there are three possible forms for a physics model (model 1, 2, or 3) and there is epistemic uncertainty about which one is correct, a discrete uncertain interval or a discrete set could represent this type of uncertainty.
Through nested
, Dakota can perform combined aleatory /
epistemic analyses such as second-order probability or probability of
frequency. For example, a variable can be assumed to have a lognormal
distribution with specified variance, with its mean expressed as an
epistemic uncertainty lying in an expert-specified interval. See
examples in Advanced Model Recursions.
State Variables
State variables consist of auxiliary variables to be mapped through the simulation interface, but are not to be designed nor modeled as uncertain. State variables provide a means to parameterize additional model inputs which, in the case of a numerical simulator, might include solver convergence tolerances, time step controls, or mesh fidelity parameters.
Note
The term “state variable” is overloaded in math, science, and engineering. For Dakota it typically means a fixed parameter and does not refer to, e.g., the solution variables of a differential equation.
State variable configuration mirrors that of design variables. They can be specified via
continuous_state
(real-valued between bounds),
discrete_state_range
(integer-valued between
bounds), or discrete_state_set
(a discrete
integer-, string-, or real-valued set). Model parameterizations with
strings (e.g., “mesh1.exo”), are also possible using an interface
analysis_components
specification
(see also Parameters file format (standard))
State variables, as with other types of variables, are viewed differently depending on the method in use. By default, only parameter studies, design of experiments, and verification methods will vary state variables. This can be overridden as discussed in Active Variables View.
Since these variables are neither design nor uncertain variables, algorithms for optimization, least squares, and uncertainty quantification do not iterate on these variables by default. They are inactive and hidden from the algorithm. However, Dakota still maps these variables through the user’s interface where they affect the computational model in use. This allows optimization, least squares, and uncertainty quantification studies to be executed under different simulation conditions (which will result, in general, in different results). Parameter studies and design of experiments methods, on the other hand, are general-purpose iterative techniques which do not by default draw a distinction between variable types. They include state variables in the set of variables to be studied, which permit them to explore the effect of state variable values on the responses of interest.
When a state variable is held fixed, the specified initial_state
is used as its sole value. If the state variable is defined only by
its bounds, then the initial_state will be inferred from the variable
bounds or valid set values. If a method iterates on a state variable,
the variable is treated as a design variable with the given bounds, or
as a uniform uncertain variable with the given bounds.
In some cases, state variables are used direct coordination with an optimization, least squares, or uncertainty quantification algorithm. For example, state variables could be used to enact model adaptivity through the use of a coarse mesh or loose solver tolerances in the initial stages of an optimization with continuous model refinement as the algorithm nears the optimal solution. They also are used to control model fidelity in some UQ approaches.
Management of Mixed Variables by Method
Active Variables View
As alluded to in the previous section, the iterative method selected for use in Dakota partially determines what subset, or view, of the variables data is active in the study. In general, a mixture of various different types of variables is supported within all methods, though by default certain methods will only modify certain types of variables. For example, by default, optimizers and least squares methods only modify design variables, and uncertainty quantification methods typically only utilize uncertain variables. This implies that variables which are not directly controlled by a particular method will be mapped through the interface unmodified. This allows for parameterizations within the model beyond those used by a the method, which can provide the convenience of consolidating the control over various modeling parameters in a single file (the Dakota input file). An important related point is that the active variable set dictates over which continuous variables derivatives are typically computed (see Active Variables for Derivatives).
Default Variables View: The default active variables view is
determined from a combination of the response function type and
method. If objective_functions
or
calibration_terms
is given in the response
specification block, the design variables will be active.
General response_functions
do not have a specific
interpretation the way objective functions or calibration terms
do. For these, the active view is inferred from the method.
For parameter studies, or any of the dace, psuade, or fsu methods, the active view is set to all variables.
For sampling uncertainty quantification methods, the view is set to aleatory if only aleatory variables are present, epistemic if only epistemic variables are present, or uncertain (covering both aleatory and epistemic) if both are present.
For interval estimation or evidence calculations, the view is set to epistemic.
For other uncertainty quantification, e.g., reliability methods or stochastic expansion methods, the view is set to aleatory.
Finally, for verification studies using
richardson_extrap
studies, the active view is set to state.
Note
For surrogate-based optimization, where the surrogate is built over
points generated by a dace_method_pointer
, the point generation
is only over the design variables unless otherwise specified, i.e.,
state variables will not be sampled for surrogate construction.
Explicit View Control: The subset of active variables for a Dakota
method can be explicitly controlled by specifying the variables
keyword active
, together with one of
all
, design
,
uncertain
, aleatory
,
epistemic
, or state
. This
causes the Dakota method to operate on the specified variable types,
and overriding the defaults. For example, the default behavior for a
nondeterministic sampling method is to sample the uncertain
variables. However, if the user specified active all
in the
variables block, the sampling would be performed over all variables
(e.g. design and state variables in addition to uncertain
variables). This may be desired in situations such as surrogate based
optimization under uncertainty, where a surrogate may be built over
both design and uncertain variables. Another situation where one may
want the fine-grained control available by specifying one of these
variable types is when one has state variables but only wants to
sample over the design variables when constructing a surrogate
model. Finally, more sophisticated uncertainty studies may involve
various combinations of epistemic vs. aleatory variables being active
in nested models.
Variable Domain
The variable domain setting controls how discrete variables (whether
design, uncertain, or state) are treated. If mixed
is specified, the continuous and non-categorical discrete variables
are treated separately. When relaxed
, the discrete
variables are relaxed and treated as continuous variables.
Domain control can be useful in optimization problems involving both continuous and discrete variables in order to apply a continuous optimizer to a mixed variable problem. All methods default to a mixed domain except for the experimental branch-and-bound method, which defaults to relaxed.
Usage Notes
Specifying set variables: Sets of integers, reals, and strings have similar specifications, though different value types. The variables are specified using three keywords:
Variable declaration keyword, e.g.,
discrete_design_set
: specifies the number of variables being defined.elements_per_variable
: a list of positive integers specifying how many set members each variable admitsLength: # of variables
Default: equal apportionment of elements among variables
elements: a list of the permissible integer values in ALL sets, concatenated together.
Length: sum of
elements_per_variable
, or an integer multiple of number of variablesThe order is very important here.
The list is partitioned according to the values of
elements_per_variable
, and each partition is assigned to a variable.
The ordering of elements_per_variable, and the partitions of elements must match the strings from descriptors
Dakota Parameters File Data Formats
Simulation interfaces employ forks
or system
calls to run simulation
workflows via user-developed drivers. Dakota communicates variable or parameter
values to the driver and receives back response values using the file system,
through the writing of parameters files and reading of results files.
Prior to invoking an analysis driver (or optional input or output filter), Dakota creates a parameters file that contains the current parameter values and a set of function requests. See the interfacing portion of the manual for full details.
Dakota supports three parameters file formats: standard, APREPRO, and
JSON. They are explained in full detail in the following sections.
Briefly, the standard format is unique to Dakota and uses a simple
value tag
syntax to communicate information about the evaluation.
In the APREPRO format, which is intended for use with the APREPRO
template processing utility [Sja92], information is contained
in a series of of statements of the form { tag = value }
. JSON is
JSON (JavaScript Object Notation), a common format for data interchange.
Parameters file format (standard)
The standard parameters file format for a single evaluation is shown in Listing 14.
<int> variables
<double> <label_cdv_i> (i = 1 to n_cdv)
<int> <label_ddiv_i> (i = 1 to n_ddiv)
<string> <label_ddsv_i> (i = 1 to n_ddsv)
<double> <label_ddrv_i> (i = 1 to n_ddrv)
<double> <label_cauv_i> (i = 1 to n_cauv)
<int> <label_dauiv_i> (i = 1 to n_dauiv)
<string> <label_dausv_i> (i = 1 to n_dausv)
<double> <label_daurv_i> (i = 1 to n_daurv)
<double> <label_ceuv_i> (i = 1 to n_ceuv)
<int> <label_deuiv_i> (i = 1 to n_deuiv)
<string> <label_deusv_i> (i = 1 to n_deusv)
<double> <label_deurv_i> (i = 1 to n_deurv)
<double> <label_csv_i> (i = 1 to n_csv)
<int> <label_dsiv_i> (i = 1 to n_dsiv)
<string> <label_dssv_i> (i = 1 to n_dssv)
<double> <label_dsrv_i> (i = 1 to n_dsrv)
<int> functions
<int> ASV_i:label_response_i (i = 1 to m)
<int> derivative_variables
<int> DVV_i:label_cdv_i (i = 1 to p)
<int> analysis_components
<string> AC_i:analysis_driver_name_i (i = 1 to q)
<string> eval_id
<int> metadata
<string> MD_i (i = 1 to r)
Integer values are denoted by <int>
, <double>
denotes a double
precision value, and <string>
denotes a string value. Each of the
major blocks denotes an array which begins with an array length and a
descriptive tag. These array lengths can be useful for dynamic memory
allocation within a simulator or filter program.
When using Dakota’s batch
interface with the standard
format, information for multiple evaluations is written in a concatenated
fashion to a single batch parameters file. The format for each evaluation
is as shown in Listing 14.
The first array for variables begins with the total number of variables
(n
) with its identifier string variables
. The next n
lines
specify the current values and descriptors of all of the variables
within the parameter set in input specification order: continuous design,
discrete integer design (integer range, integer set), discrete string
design (string set), discrete real design (real set), continuous
aleatory uncertain (normal, lognormal, uniform, loguniform, triangular,
exponential, beta, gamma, gumbel, frechet, weibull, histogram bin),
discrete integer aleatory uncertain (poisson, binomial, negative
binomial, geometric, hypergeometric, histogram point integer), discrete
string aleatory uncertain (histogram point string), discrete real
aleatory uncertain (histogram point real), continuous epistemic
uncertain (real interval), discrete integer epistemic uncertain
(interval, then set), discrete string epistemic uncertain (set),
discrete real epistemic uncertain (set), continuous state, discrete
integer state (integer range, integer set), discrete string state, and
discrete real state (real set) variables.
Note
The authoritative variable ordering (as noted above in
Overview) is given by the primary table in
variables
.
The lengths of these vectors add to a total of \(n\), i.e.,
If any of the variable types are not present in the problem, then its block is omitted entirely from the parameters file. The labels come from the variable descriptors specified in the Dakota input file, or default descriptors based on variable type if not specified.
The second array for the active set vector (ASV) begins with the total
number of functions (m
) and its identifier string functions
.
The next m
lines specify the request vector for each of the m
functions in the response data set followed by the tags
ASV_i:label_response
, where the label is either a user-provided
response descriptor or a default-generated one. These integer codes
indicate what data is required on the current function evaluation and
are described further in The Active Set Vector.
The third array for the derivative variables vector (DVV) begins with
the number of derivative variables (p
) and its identifier string
derivative_variables
. The next p
lines specify integer
variable identifiers followed by the tags DVV_i:label_cdv
. These
integer identifiers are used to identify the subset of variables that
are active for the calculation of derivatives (gradient vectors and
Hessian matrices), and correspond to the list of variables in the first
array (e.g., an identifier of 2 indicates that the second variable in
the list is active for derivatives). The labels are again taken from
user-provided or default variable descriptors.
The fourth array for the analysis components (AC) begins with the number
of analysis components (q
) and its identifier string
analysis_components
. The next q
lines provide additional
strings for use in specializing a simulation interface followed by the
tags AC_i:analysis_driver_name
, where analysis_driver_name
indicates the driver associated with this component. These strings are
specified in the input file for a set of analysis_drivers
using
the analysis_components
specification. The subset of the analysis
components used for a particular analysis driver is the set passed in a
particular parameters file.
The next entry eval_id
in the parameters file is the evaluation
ID, by default an integer indicating interface evaluation ID
number. When hierarchical tagging is enabled as described in
File Tagging for Evaluations, the identifier will be a
colon-separated string, e.g., 4:9:2
.
The final array for the metadata (MD) begins with the number of
metadata fields requested (r
) and its identifier string
metadata
. The next r
lines provide the names of
each metadata field followed by the tags MD_i
.
Note
Several standard-format parameters file examples are shown in Parameter to Response Mapping Examples.
Parameters file format (APREPRO)
For the APREPRO format option, the same data is present in the same
order as the standard format. The only difference is that values are
associated with their tags using { tag = value }
markup as shown
in Listing 15. An APREPRO-format
parameters file example is shown in Parameter to Response Mapping Examples. The
APREPRO format allows direct usage of Dakota parameters files by the
APREPRO utility and Dakota’s DPrePro, which are file pre-processors
that can significantly simplify model parameterization.
Note
APREPRO [Sja92] is a Sandia-developed pre-processor that is not distributed with Dakota.
DPrePro is a Python script distributed with Dakota that performs many of the same functions as APREPRO, as well as general template processing, and is optimized for use with Dakota parameters files in either format.
BPREPRO and JPrePost are Perl and Java tools, respectively, in use at other sites.
When a parameters file in APREPRO format is included within a template
file (using an include
directive), APREPRO recognizes these
constructs as variable definitions which can then be used to populate
targets throughout the template file. DPrePro, conversely, does not
require the use of includes since it processes the Dakota parameters
file and template simulation file separately to create a simulation
input file populated with the variables data.
{ DAKOTA_VARS = <int> }
{ <label_cdv_i> = <double> } (i = 1 to n_cdv)
{ <label_ddiv_i> = <int> } (i = 1 to n_ddiv)
{ <label_ddsv_i> = <string> } (i = 1 to n_ddsv)
{ <label_ddrv_i> = <double> } (i = 1 to n_ddrv)
{ <label_cauv_i> = <double> } (i = 1 to n_cauv)
{ <label_dauiv_i> = <int> } (i = 1 to n_dauiv)
{ <label_dausv_i> = <string> } (i = 1 to n_dausv)
{ <label_daurv_i> = <double> } (i = 1 to n_daurv)
{ <label_ceuv_i> = <double> } (i = 1 to n_ceuv)
{ <label_deuiv_i> = <int> } (i = 1 to n_deuiv)
{ <label_deusv_i> = <string> } (i = 1 to n_deusv)
{ <label_deurv_i> = <double> } (i = 1 to n_deurv)
{ <label_csv_i> = <double> } (i = 1 to n_csv)
{ <label_dsiv_i> = <int> } (i = 1 to n_dsiv)
{ <label_dssv_i> = <string> } (i = 1 to n_dssv)
{ <label_dsrv_i> = <double> } (i = 1 to n_dsrv)
{ DAKOTA_FNS = <int> }
{ ASV_i:label_response_i = <int> } (i = 1 to m)
{ DAKOTA_DER_VARS = <int> }
{ DVV_i:label_cdv_i = <int> } (i = 1 to p)
{ DAKOTA_AN_COMPS = <int> }
{ AC_i:analysis_driver_name_i = <string> } (i = 1 to q)
{ DAKOTA_EVAL_ID = <string> }
{ DAKOTA_METADATA = <int> }
{ MD_i = <string> } (i = 1 to r)
As with the standard format, batch
parameters files
are simply a concatenation of the information for evaluations in the batch
in ARREPRO format.
Parameters file format (JSON)
The JSON format encodes information using two structures, objects and arrays. An object is a collection of name/value pairs. In many programming languages it may be known as a dictionary, associative array, map, or hash table. An array is an ordered list of values, and is commonly known as an array, vector, or list. Objects and arrays may contain other objects or arrays, or scalar values that have “primitive” types such as strings, numbers, or booleans.
In Dakota’s JSON format, information about each evaluation is stored in a top-level object. The object contains the names (also known as keys):
{
"variables": [],
"responses": [],
"derivative_variables": []
"analysis_components": [],
"eval_id": "",
"metadata": []
}
Unlike in the standard and APREPRO formats, the numbers of variables,
responses, derviatve variables, etc, are not explicitly included in the JSON
parameters file. They are unnecessary for parsing the file and are simply
the lengths of the array in question. Another difference between the JSON
format and the standard and APREPRO formats arises when using Dakota’s
batch
interface. The top-level data structure of a
JSON format batch parameters file is an array, which contains evaluation
objects.
Variables
Variable labels and values are stored within objects that are elements of the variables array. Each object resembles the following, where the variable value is an integer, double, or string, as appropriate.
{
"label": "<label_var_i>",
"value": <variable value>
}
The order of the variables in the array is the same as for the standard and APREPRO format files, described in the previous two sections.
Responses
The responses name is associated with an array of objects that store the label and active set for each expected response.
{
"label": "<label_response_i>",
"active_set": <int>
}
Derivative Variables
Gradients and Hessians, if requested, are expected to be computed with respect to the derivative_variables. The array associated with this key contains 1-based indices into the variables array.
Analysis Components
The analysis_components name is associated with an array of analysis components objects of the form:
{
"driver": "<driver_string>",
"component": "<an_comp_i>"
}
Evaluation ID and Metadata
Finally, the evaluation ID is a string associated with the eval_id key, and the metadata name refers to an array of strings, a list of the expected metadata responses.
The Active Set Vector
The active set vector (ASV) specifies the function value or derivative response data needed for a particular interface evaluation. Dakota’s ASV gets its name from managing the active set, i.e., the set of functions that are required by a method on a particular function evaluation. However, it also indicates the derivative data needed for active functions, so has an extended meaning beyond that typically used in the optimization literature.
Note
By default a simulation interface is expected to parse the ASV and only return the requested functions, gradients, and Hessians. To alleviate this requirement, see deactivating below.
The active set vector is comprised of vector of integer codes 0–7, one per response function. The integer values 0 through 7 denote a 3-bit binary representation of all possible combinations of value (1), gradient (2), and Hessian (4) requests for a particular function, with the most significant bit denoting the Hessian, the middle bit denoting the gradient, and the least significant bit denoting the value. The specific translations are shown in Table 2.
Integer Code |
Binary representation |
Meaning |
---|---|---|
7 |
111 |
Get Hessian, gradient, and value |
6 |
110 |
Get Hessian and gradient |
5 |
101 |
Get Hessian and value |
4 |
100 |
Get Hessian |
3 |
011 |
Get gradient and value |
2 |
010 |
Get gradient |
1 |
001 |
Get value |
0 |
000 |
No data required, function is inactive |
Disabling the ASV: Active set vector control may be turned off to
obviate the need for the interface script to check and respond to its
contents. When deactivate
active_set_vector
is specified, the
interface is expected to return all function, gradient, and Hessian
information enabled in the responses
block on every function
evaluation.
This option affords a simpler interface implemention, but of course in trade for efficiency. Disabling is most appropriate for cases in which only a relatively small penalty occurs when computing and returning more data than needed on a particular function evaluation.