Distribution Parameters

Variables are characterized by parameters such as the mean and standard deviation or lower and upper bounds. Typically, users provide these parameters as part of their input to Dakota, but Dakota itself may also compute them as it scales and transforms variables, normalizes empirical distributions (e.g. for histogram_bin_uncertain variables), or calculates alternative parameterizations (lambda and zeta vs mean and standard deviation for a lognormal_uncertain).

Beginning with release 6.11, models write their variable’s parameters to HDF5. The information is located in each model’s properties/variable_parameters subgroup. Within this group, parameters are stored by Dakota variable type (e.g. normal_uncertain), with one 1D dataset per type. The datasets have the same names as their variable types and have one element per variable. Parameters are stored by name.

Consider the following variable specification, which includes two normal and two uniform variables:

variables
  normal_uncertain 2
    descriptors 'nuv_1' 'nuv_2'
    means 0.0 1.0
    std_devations 1.0 0.5
  uniform_uncertain 2
  descriptors 'uuv_1' 'uuv_2'
    lower_bounds -1.0 0.0
    upper_bounds  1.0 1.0

Given this specification, and assuming a model ID of “tb_model”, Dakota will write two 1D datasets, both of length 2, to the group /models/simulation/tb_model/metadata/variable_parameters, the first named normal_uncertain, and the second named uniform_uncertain. Using a JSON-like representation for illustration, the normal_uncertain dataset will appear as:

[
  {
    "mean": 0.0,
    "std_deviation": 1.0,
    "lower_bound": -inf,
    "upper_bound": inf
  },
  {
    "mean": 1.0,
    "std_deviation": 0.5,
    "lower_bound": -inf,
    "upper_bound": inf
  }
]

The uniform_uncertain dataset will contain:

[
  {
    "lower_bound": -1.0,
    "upper_bound":  1.0
  },
  {
    "lower_bound": 0.0,
    "upper_bound": 1.0
  }
]

In these representations of the normal_uncertain and uniform_uncertain datasets, the outer square brackets ([]) enclose the dataset, and each element within the datasets are enclosed in curly braces ({}). The curly braces are meant to indicate that the elements are dictionary-like objects that support access by string field name. A bit more concretely, the following code snippet demonstrates reading the mean of the second normal variable, nuv_2.

import h5py
 2
with h5py.File("dakota_results.h5') as h:
   model = h["/models/simulation/tb_model/"]
   # nu_vars is the dataset that contains distribution parameters for
   # normal_uncertain variables
   nu_vars = model["variable_parameters/normal_uncertain"]
   nuv_2_mu = nu_vars[1]["mean"] # 1 is the 0-based index of nuv_2, and
                                 # "mean" is the name of the field where
                                 # the mean is stored; nuv_2_mu now contains
                                 # 1.0.

The feature in HDF5 that underlies this name-based storage of fields is compound datatypes, which are similar to C/C++ structs or Python dictionaries. Further information about how to work with compound datatypes is available in the h5py documentation.

Naming Conventions and Layout

In most cases, datasets for storing parameters have names that match their variable types. The normal_uncertain and uniform_uncertain datasets illustrated above are examples. Exceptions include types such as discrete_design_set, which has string, integer, and real subtypes. For these, the dataset name is the top-level type with _string, _int, or _real appended: discrete_design_set_string, discrete_design_set_int, and discrete_design_set_real.

Most Dakota variable types have scalar parameters. For these, the names of the parameters are generally the singular form of the associated Dakota keyword. For example, triangular_uncertain variables are characterized in Dakota input using the plural keywords modes, lower_bounds, and upper_bounds. The singular field names are, respectively, “mode”, “lower_bound”, and “upper_bound”. In this case, all three parameters are real-valued and stored as floating point numbers, but variable types/fields can also be integer-valued (e.g. binomial_uncertain/num_trials) or string-valued.

Some variable/parameter fields contain 1D arrays or vectors of information. Consider histogram_bin_uncertain variables, for which the user specifies not just one value, but an ordered collection of abscissas and corresponding ordinates or counts. Dakota stores the abscissas in the “abscissas” field, which is a 1D dataset of floating-point numbers. It similarly stores the counts in the “counts” field. (In this case, only the normalized counts are stored, regardless of whether the user provided counts or ordinates.)

When the user specifies more than one histogram_bin_uncertain variable, it often is also necessary to include the pairs_per_variable keyword to divide the abscissa/count pairs among the variables. This raises the question of how lists of parameters that vary in length across the variables ought to be stored.

Although HDF5 supports variable-length datasets, for simplicity (and due to limitations in h5py at the time of the 6.11 release), Dakota stores vector parameter fields in conventional fixed-length datasets. The lengths of these datasets are determined at runtime in the following way: For a particular variable type and field, the field for all variables is sized to be large enough to accommodate the variable with the longest list of parameters. Any unused space for a particular variable is filled with NaN (if the parameter is real-valued), INTMAX (integer-valued), or an empty string (string-valued). In addition, each variable has an additional field, “num_elements”, that reports the number of elements in the fields that contain actual data and not fill values.

Consider this example, in which the user has specified a pair of histogram_bin_uncertain variables. The first has 3 pairs, and the second has 4.

variables
  histogram_bin_uncertain 2
    pairs_per_variable 2 3
    abscissas  0.0   0.5  1.0
              -1.0  -0.5  0.5  1.0
    counts     0.25  0.75 0.0
               0.2   0.4  0.2  0.0

For this specification, Dakota will write a dataset named histogram_bin_uncertain to the metadata/variable_parameters/ subgroup for the model. It will be of length 2, one element for each variable, and contain the following:

[
  {
    "num_elements": 3,
    "abscissas": [0.0, 0.5, 1.0, NaN],
    "counts": [0.25, 0.75, 0.0, NaN]
  },
  {
    "num_elements": 4,
    "abscissas": [-1.0, -0.5, 0.5, 1.0],
    "counts": [0.2, 0.4, 0.2, 0.0]
  }
]

h5py Examples

The fields available for a variable parameters dataset can be determined in h5py by examining the datatype of the dataset.

import h5py
with h5py.File("dakota_results.h5") as h:
   model = h["/models/simulation/NO_MODEL_ID/"]
   md = model["metadata/variable_parameters"]
   nu = md["normal_uncertain"]
   nu_param_names = nu.dtype.names
   # nu_param_names is a tuple of strings: ('mean', 'std_deviation',
   # 'lower_bound', 'upper_bound')

Known Limitations

h5py has a known bug that prevents parameters for some types of variables from being accessed (the Python interpreter crashes with a segfault). These include:

histogram_point_uncertain string

discrete_uncertain_set string

Metadata

The variable parameter datasets have two dimension scales. The first (index 0) contains the variable descriptors, and the second (index 1) contains variable Ids. Available Parameters

Parameter Listing for All Types

The table below lists all Dakota variables and parameters that can be stored.

Distribution Parameters

Variable Type	Parameter Name	Type	Rank
continuous_design	lower_bound	real	scalar
continuous_design	upper_bound	real	scalar
discrete_design_range	lower_bound	integer	scalar
discrete_design_range	upper_bound	integer	scalar
discrete_design_set_int	num_elements	integer	scalar
discrete_design_set_int	elements	integer	vector
discrete_design_set_string	num_elements	integer	scalar
discrete_design_set_string	elements	string	vector
discrete_design_set_real	num_elements	integer	scalar
discrete_design_set_real	elements	real	vector
normal_uncertain	mean	real	scalar
	std_deviation	real	scalar
	lower_bound	real	scalar
	upper_bound	real	scalar
lognormal_uncertain	lower_bound	real	scalar
	upper_bound	real	scalar
	mean	real	scalar
	std_deviation	real	scalar
	error_factor	real	scalar
	lambda	real	scalar
	zeta	real	scalar
uniform_uncertain	lower_bound	real	scalar
uniform_uncertain	upper_bound	real	scalar
loguniform_uncertain	lower_bound	real	scalar
loguniform_uncertain	upper_bound	real	scalar
triangular_uncertain	mode	real	scalar
	lower_bound	real	scalar
	upper_bound	real	scalar
exponential_uncertain	beta	real	scalar
beta_uncertain	alpha	real	scalar
	beta	real	scalar
	lower_bound	real	scalar
	upper_bound	real	scalar
gamma_uncertain	alpha	real	scalar
gamma_uncertain	beta	real	scalar
gumbel_uncertain	alpha	real	scalar
gumbel_uncertain	beta	real	scalar
frechet_uncertain	alpha	real	scalar
frechet_uncertain	beta	real	scalar
weibull_uncertain	alpha	real	scalar
weibull_uncertain	beta	real	scalar
histogram_bin_uncertain	num_elements	integer	scalar
	abscissas	real	vector
	counts	real	vector
poisson_uncertain	lambda	real	scalar
binomial_uncertain	probability_per_trial	real	scalar
binomial_uncertain	num_trials	integer	scalar
negative_binomial_uncertain	probability_per_trial	real	scalar
negative_binomial_uncertain	num_trials	integer	scalar
geometric_uncertain	probability_per_trial	real	scalar
hypergeometric_uncertain	total_population	integer	scalar
	selected_population	integer	scalar
	num_drawn	integer	scalar
histogram_point_uncertain_int	num_elements	integer	scalar
	abscissas	integer	vector
	counts	real	vector
histogram_point_uncertain_real	num_elements	integer	scalar
	abscissas	real	vector
	counts	real	vector
continuous_interval_uncertain	num_elements	integer	scalar
	interval_probabilities	real	vector
	lower_bounds	real	vector
	upper_bounds	real	vector
discrete_interval_uncertain	num_elements	integer	scalar
	interval_probabilities	real	vector
	lower_bounds	integer	vector
	upper_bounds	integer	vector
discrete_uncertain_set_int	num_elements	integer	scalar
	elements	integer	vector
	set_probabilities	real	vector
discrete_uncertain_set_real	num_elements	integer	scalar
	elements	real	vector
	set_probabilities	real	vector
continuous_state	lower_bound	real	scalar
continuous_state	upper_bound	real	scalar
discrete_state_range	lower_bound	integer	scalar
discrete_state_range	upper_bound	integer	scalar
discrete_state_set_int	num_elements	integer	scalar
discrete_state_set_int	elements	integer	vector
discrete_state_set_string	num_elements	integer	scalar
discrete_state_set_string	elements	string	vector
discrete_state_set_real	num_elements	integer	scalar
discrete_state_set_real	elements	real	vector