Distribution Parameters
Variables are characterized by parameters such as the mean and standard deviation or lower and upper bounds. Typically, users provide these parameters as part of their input to Dakota, but Dakota itself may also compute them as it scales and transforms variables, normalizes empirical distributions (e.g. for histogram_bin_uncertain
variables), or calculates alternative parameterizations (lambda and zeta vs mean and standard deviation for a lognormal_uncertain
).
Beginning with release 6.11, models write their variable’s parameters to HDF5. The information is located in each model’s properties/variable_parameters
subgroup. Within this group, parameters are stored by Dakota variable type (e.g. normal_uncertain
), with one 1D dataset per type. The datasets have the same names as their variable types and have one element per variable. Parameters are stored by name.
Consider the following variable specification, which includes two normal and two uniform variables:
variables
normal_uncertain 2
descriptors 'nuv_1' 'nuv_2'
means 0.0 1.0
std_devations 1.0 0.5
uniform_uncertain 2
descriptors 'uuv_1' 'uuv_2'
lower_bounds -1.0 0.0
upper_bounds 1.0 1.0
Given this specification, and assuming a model ID of “tb_model”, Dakota will write two 1D datasets, both of length 2, to the group /models/simulation/tb_model/metadata/variable_parameters
, the first named normal_uncertain
, and the second named uniform_uncertain
. Using a JSON-like representation for illustration, the normal_uncertain
dataset will appear as:
[
{
"mean": 0.0,
"std_deviation": 1.0,
"lower_bound": -inf,
"upper_bound": inf
},
{
"mean": 1.0,
"std_deviation": 0.5,
"lower_bound": -inf,
"upper_bound": inf
}
]
The uniform_uncertain dataset will contain:
[
{
"lower_bound": -1.0,
"upper_bound": 1.0
},
{
"lower_bound": 0.0,
"upper_bound": 1.0
}
]
In these representations of the normal_uncertain
and uniform_uncertain
datasets, the outer square brackets ([]) enclose the dataset, and each element within the datasets are enclosed in curly braces ({}). The curly braces are meant to indicate that the elements are dictionary-like objects that support access by string field name. A bit more concretely, the following code snippet demonstrates reading the mean of the second normal variable, nuv_2.
1 import h5py
2
3 with h5py.File("dakota_results.h5') as h:
4 model = h["/models/simulation/tb_model/"]
5 # nu_vars is the dataset that contains distribution parameters for
6 # normal_uncertain variables
7 nu_vars = model["variable_parameters/normal_uncertain"]
8 nuv_2_mu = nu_vars[1]["mean"] # 1 is the 0-based index of nuv_2, and
9 # "mean" is the name of the field where
10 # the mean is stored; nuv_2_mu now contains
11 # 1.0.
The feature in HDF5 that underlies this name-based storage of fields is compound datatypes, which are similar to C/C++ structs or Python dictionaries. Further information about how to work with compound datatypes is available in the h5py documentation.
Naming Conventions and Layout
In most cases, datasets for storing parameters have names that match their variable types. The normal_uncertain
and uniform_uncertain
datasets
illustrated above are examples. Exceptions include types such as discrete_design_set, which has string, integer, and
real subtypes. For these, the dataset name is the top-level type with _string
, _int
, or _real
appended: discrete_design_set_string
,
discrete_design_set_int
, and discrete_design_set_real
.
Most Dakota variable types have scalar parameters. For these, the names of the parameters are generally the singular form of the associated Dakota keyword.
For example, triangular_uncertain variables are characterized in Dakota input using the plural keywords modes
,
lower_bounds
, and upper_bounds
. The singular field names are, respectively, “mode”, “lower_bound”, and “upper_bound”. In this case, all three
parameters are real-valued and stored as floating point numbers, but variable types/fields can also be integer-valued
(e.g. binomial_uncertain/num_trials
) or string-valued.
Some variable/parameter fields contain 1D arrays or vectors of information. Consider histogram_bin_uncertain variables, for which the user specifies not just one value, but an ordered collection of abscissas and corresponding ordinates or counts. Dakota stores the abscissas in the “abscissas” field, which is a 1D dataset of floating-point numbers. It similarly stores the counts in the “counts” field. (In this case, only the normalized counts are stored, regardless of whether the user provided counts or ordinates.)
When the user specifies more than one histogram_bin_uncertain
variable, it often is also necessary to include the pairs_per_variable
keyword to divide the abscissa/count pairs among the variables. This raises the question of how lists of parameters that vary in length across the variables ought to be stored.
Although HDF5 supports variable-length datasets, for simplicity (and due to limitations in h5py at the time of the 6.11 release), Dakota stores vector parameter
fields in conventional fixed-length datasets. The lengths of these datasets are determined at runtime in the following way: For a particular variable type
and field, the field for all variables is sized to be large enough to accommodate the variable with the longest list of parameters. Any unused space for a
particular variable is filled with NaN
(if the parameter is real-valued), INTMAX
(integer-valued), or an empty string (string-valued
). In addition, each variable
has an additional field, “num_elements”, that reports the number of elements in the fields that contain actual data and not fill values.
Consider this example, in which the user has specified a pair of histogram_bin_uncertain
variables. The first has 3 pairs, and the second has 4.
variables
histogram_bin_uncertain 2
pairs_per_variable 2 3
abscissas 0.0 0.5 1.0
-1.0 -0.5 0.5 1.0
counts 0.25 0.75 0.0
0.2 0.4 0.2 0.0
For this specification, Dakota will write a dataset named histogram_bin_uncertain
to the metadata/variable_parameters/
subgroup for the model. It will be of
length 2, one element for each variable, and contain the following:
[
{
"num_elements": 3,
"abscissas": [0.0, 0.5, 1.0, NaN],
"counts": [0.25, 0.75, 0.0, NaN]
},
{
"num_elements": 4,
"abscissas": [-1.0, -0.5, 0.5, 1.0],
"counts": [0.2, 0.4, 0.2, 0.0]
}
]
h5py Examples
The fields available for a variable parameters dataset can be determined in h5py by examining the datatype of the dataset.
1 import h5py
2 with h5py.File("dakota_results.h5") as h:
3 model = h["/models/simulation/NO_MODEL_ID/"]
4 md = model["metadata/variable_parameters"]
5 nu = md["normal_uncertain"]
6 nu_param_names = nu.dtype.names
7 # nu_param_names is a tuple of strings: ('mean', 'std_deviation',
8 # 'lower_bound', 'upper_bound')
Known Limitations
h5py has a known bug that prevents parameters for some types of variables from being accessed (the Python interpreter crashes with a segfault). These include:
histogram_point_uncertain
string
discrete_uncertain_set
string
Metadata
The variable parameter datasets have two dimension scales. The first (index 0) contains the variable descriptors, and the second (index 1) contains variable Ids. Available Parameters
Parameter Listing for All Types
The table below lists all Dakota variables and parameters that can be stored.
Distribution Parameters
Variable Type |
Parameter Name |
Type |
Rank |
---|---|---|---|
continuous_design |
lower_bound |
real |
scalar |
upper_bound |
real |
scalar |
|
discrete_design_range |
lower_bound |
integer |
scalar |
upper_bound |
integer |
scalar |
|
discrete_design_set_int |
num_elements |
integer |
scalar |
elements |
integer |
vector |
|
discrete_design_set_string |
num_elements |
integer |
scalar |
elements |
string |
vector |
|
discrete_design_set_real |
num_elements |
integer |
scalar |
elements |
real |
vector |
|
normal_uncertain |
mean |
real |
scalar |
std_deviation |
real |
scalar |
|
lower_bound |
real |
scalar |
|
upper_bound |
real |
scalar |
|
lognormal_uncertain |
lower_bound |
real |
scalar |
upper_bound |
real |
scalar |
|
mean |
real |
scalar |
|
std_deviation |
real |
scalar |
|
error_factor |
real |
scalar |
|
lambda |
real |
scalar |
|
zeta |
real |
scalar |
|
uniform_uncertain |
lower_bound |
real |
scalar |
upper_bound |
real |
scalar |
|
loguniform_uncertain |
lower_bound |
real |
scalar |
upper_bound |
real |
scalar |
|
triangular_uncertain |
mode |
real |
scalar |
lower_bound |
real |
scalar |
|
upper_bound |
real |
scalar |
|
exponential_uncertain |
beta |
real |
scalar |
beta_uncertain |
alpha |
real |
scalar |
beta |
real |
scalar |
|
lower_bound |
real |
scalar |
|
upper_bound |
real |
scalar |
|
gamma_uncertain |
alpha |
real |
scalar |
beta |
real |
scalar |
|
gumbel_uncertain |
alpha |
real |
scalar |
beta |
real |
scalar |
|
frechet_uncertain |
alpha |
real |
scalar |
beta |
real |
scalar |
|
weibull_uncertain |
alpha |
real |
scalar |
beta |
real |
scalar |
|
histogram_bin_uncertain |
num_elements |
integer |
scalar |
abscissas |
real |
vector |
|
counts |
real |
vector |
|
poisson_uncertain |
lambda |
real |
scalar |
binomial_uncertain |
probability_per_trial |
real |
scalar |
num_trials |
integer |
scalar |
|
negative_binomial_uncertain |
probability_per_trial |
real |
scalar |
num_trials |
integer |
scalar |
|
geometric_uncertain |
probability_per_trial |
real |
scalar |
hypergeometric_uncertain |
total_population |
integer |
scalar |
selected_population |
integer |
scalar |
|
num_drawn |
integer |
scalar |
|
histogram_point_uncertain_int |
num_elements |
integer |
scalar |
abscissas |
integer |
vector |
|
counts |
real |
vector |
|
histogram_point_uncertain_real |
num_elements |
integer |
scalar |
abscissas |
real |
vector |
|
counts |
real |
vector |
|
continuous_interval_uncertain |
num_elements |
integer |
scalar |
interval_probabilities |
real |
vector |
|
lower_bounds |
real |
vector |
|
upper_bounds |
real |
vector |
|
discrete_interval_uncertain |
num_elements |
integer |
scalar |
interval_probabilities |
real |
vector |
|
lower_bounds |
integer |
vector |
|
upper_bounds |
integer |
vector |
|
discrete_uncertain_set_int |
num_elements |
integer |
scalar |
elements |
integer |
vector |
|
set_probabilities |
real |
vector |
|
discrete_uncertain_set_real |
num_elements |
integer |
scalar |
elements |
real |
vector |
|
set_probabilities |
real |
vector |
|
continuous_state |
lower_bound |
real |
scalar |
upper_bound |
real |
scalar |
|
discrete_state_range |
lower_bound |
integer |
scalar |
upper_bound |
integer |
scalar |
|
discrete_state_set_int |
num_elements |
integer |
scalar |
elements |
integer |
vector |
|
discrete_state_set_string |
num_elements |
integer |
scalar |
elements |
string |
vector |
|
discrete_state_set_real |
num_elements |
integer |
scalar |
elements |
real |
vector |