multifidelity_sampling

Multifidelity Monte Carlo sampling method for UQ

Specification

Alias: multifidelity_mc mfmc
Arguments: None

Child Keywords:

Required/Optional	Dakota Keyword	Dakota Keyword Description
Optional	seed_sequence	Sequence of seed values for multi-stage random sampling
Optional	fixed_seed	Reuses the same seed value for multiple random sampling sets
Optional	pilot_samples	Initial set of samples for multilevel/multifidelity sampling methods.
Optional	solution_mode	Solution mode for multilevel/multifidelity methods
Optional	numerical_solve	Specify the situations where numerical optimization is used for MFMC sample allocation
Optional	search_model_graphs	Perform a search over admissible model relationships for a given model ensemble
Optional	sample_type	Selection of sampling strategy
Optional	export_sample_sequence	Enable export of multilevel/multifidelity sample sequences to individual files
Optional	convergence_tolerance	Stopping criterion based on relative error reduction
Optional	max_iterations	Number of iterations allowed for optimizers and adaptive UQ methods
Optional	max_function_evaluations	Stopping criterion based on maximum function evaluations
Optional	rng	Selection of a random number generator
Optional	model_pointer	Identifier for model block to be used by a method

Description

An adaptive sampling method that utilizes multifidelity relationships in order to improve efficiency through variance reduction.

Multifidelity sampling is a recursive sampling scheme for which model ordering is important. In the case of an ensemble surrogate model with more than two model instances (either in terms of model forms or resolutions or both), the multi-model approach of [PWG16] is supported for which all model instances can be integrated into the scheme. In the special case of two model instances, this collapses to the approach of [NgW14]. The approach can be used with a model form sequence, a resolution level sequence, or a combination of both (all specified form/resolution combinations will be enumerated).

Control Variate Monte Carlo

In the case of two model instances (low fidelity denoted as LF and high fidelity denoted as HF), we employ a control variate approach as described in [NgW14]:

{\hat{Q}}_{H F}^{C V} = {\hat{Q}}_{H F}^{M C} - β ({\hat{Q}}_{L F}^{M C} - E [Q_{L F}])

As opposed to the traditional control variate approach, we do not know $E [Q_{L F}]$ precisely, but rather we estimate it more accurately than ${\hat{Q}}_{L F}^{M C}$ based on a sampling increment applied to the LF model. This sampling increment is based on a total cost minimization procedure that incorporates the relative LF and HF costs and the observed Pearson correlation coefficient $ρ_{L H}$ between $Q_{L F}$ and $Q_{H F}$ . The coefficient $β$ is then determined from the observed LF-HF covariance and LF variance.

Multifidelity Monte Carlo

This approach can be extended to a sequence of low-fidelity approximations using a recusive sampling approach as in [PWG16].

{\hat{Q}}_{H F}^{C V} = {\hat{Q}}_{H F}^{M C} - \sum_{i = 1}^{M} β_{i} ({\hat{Q}}_{L F_{i}}^{M C} - E [Q_{L F_{i}}])

In this case, the variance in the estimate of the $i^{t h}$ control mean is reduced by the $(i + 1)^{t h}$ control variate, such that the variance reduction is limited by the case of an exact estimate of the first control mean (referred to as OCV-1 in [GGEJ20]).

Model Selection

Similar to weighted MLMC (see weighted), MFMC is a special case of generalized ACV ([BLWL22]) using the ACV-MF sampling scheme in combination with a hierarchical DAG (each approximation node points to the next approximation of higher fidelity, ending with the truth model at the root node). As such, the MFMC approach can be promoted to the generalized ACV solver in order to gain access to its model selection capabilities. Activating the model_selection option results in a set of numerical solutions that will enumerate model combinations for a fixed hierarchical DAG.

Default Behavior

The multifidelity_sampling method employs a number of important default settings:

The DAG defining control variate pairings is a “hierarchical” DAG where each approximation node points to the next approximation of higher fidelity, ending with the truth model at the root node. For other DAG options, refer to the approximate control variate method (peer DAG by default) and it’s option to search_model_graphs (enumerates all or some subset of the admissible DAGs).
If the model QoI are well-ordered, then the analytic solution from [PWG16] will be used. If a numerical solution is instead used, whether due to an override specification or due to detection of model misordering, the numerical solver will be global_local by default, starting with the DIRECT global solver and proceeding to available local solvers (SQP and NIP) in competition. The numerical solution reorders models on the fly in order to enforce the required ordering constraints for computing the MFMC estimator variance.
Solution mode will be online_pilot, an approach which iterates toward a set of shared samples whose size is consistent with the optimal allocation
Monte Carlo sample sets are used by default and are most consistent with the underlying theory, but this default can be overridden to use Latin hypercube sample sets using sample_type lhs. Allocations remain governed by Monte Carlo variance for all cases.

Expected Output

The multifidelity_sampling method reports estimates of the first four moments and a summary of the evaluations performed for each model fidelity and discretization level. The method does not support any level mappings (response, probability, reliability, generalized reliability) at this time.

Expected HDF5 Output

If Dakota was built with HDF5 support and run with the hdf5 keyword, this method writes the following results to HDF5:

Sampling Moments (moments only, not confidence intervals)

In addition, the execution group has the attribute equiv_hf_evals, which records the equivalent number of high-fidelity evaluations.

Usage Tips

The multifidelity_sampling method is used in combination with an ensemble model specification for a model form sequence, a discretization level sequence, or both. For a model form sequence, each model must provide a scalar solution_level_cost. For a discretization level sequence, it is necessary to identify the variable string descriptor that controls the resolution levels using solution_level_control as well as the associated array of relative costs using solution_level_cost. An alternative to prescribing the cost profile is estimating it on the fly using cost metadata that is returned from the different simulation instances.

Examples

We provide an example of a multifidelity Monte Carlo study using an ensemble model specification employing multiple approximations.

The following method block:

method,
 model_pointer = 'NONHIER'
 multifidelity_sampling
   pilot_samples = 20 seed = 1237
   max_iterations = 10
   convergence_tolerance = .001

specifies MFMC in combination with the model identified by the NONHIER pointer.

This NONHIER model specification provides a truth model and a set of unordered approximation models, each with a single (or default) discretization level:

model,
 id_model = 'NONHIER'
 surrogate ensemble
   truth_model = 'HF'
   unordered_model_fidelities = 'LF1' 'LF2'

model,
 id_model = 'LF1'
 interface_pointer = 'LF1_INT'
 simulation
   solution_level_cost = 0.01

model,
 id_model = 'LF2'
 interface_pointer = 'LF2_INT'
 simulation
   solution_level_cost = 0.1

model,
 id_model = 'HF'
 interface_pointer = 'HF_INT'
 simulation
   solution_level_cost = 1.

Refer to dakota/test/dakota_uq_*_cvmc.in and dakota/test/dakota_uq_*_mfmc.in in the source distribution for additional examples.

multifidelity_sampling

Exceptional service in the national interest