gpmsa

(Experimental) Gaussian Process Models for Simulation Analysis (GPMSA) Bayesian calibration

Topics

package_queso, bayesian_calibration

Specification

  • Alias: None

  • Arguments: None

Child Keywords:

Required/Optional

Description of Group

Dakota Keyword

Dakota Keyword Description

Required

chain_samples

Number of Markov Chain Monte Carlo posterior samples

Optional

seed

Seed of the random number generator

Optional

rng

Selection of a random number generator

Required

build_samples

Number of initial model evaluations used in build phase

Optional

import_build_points_file

File containing points you wish to use to build a surrogate

Optional

standardized_space

Perform Bayesian inference in standardized probability space

Optional

logit_transform

Utilize the logit transformation to reduce sample rejection for bounded domains

Optional

gpmsa_normalize

Enable GPMSA-internal normalization

Optional

export_chain_points_file

Export the MCMC chain to the specified filename

Optional (Choose One)

MCMC Algorithm

dram

Use the DRAM MCMC algorithm

delayed_rejection

Use the Delayed Rejection MCMC algorithm

adaptive_metropolis

Use the Adaptive Metropolis MCMC algorithm

metropolis_hastings

Use the Metropolis-Hastings MCMC algorithm

Optional

proposal_covariance

Defines the technique used to generate the MCMC proposal covariance.

Optional

options_file

File containing advanced QUESO/GPMSA options

Description

GPMSA (Gaussian Process Models for Simulation Analysis) is a surrogate-based Markov Chain Monte Carlo Bayesian calibration method. Dakota’s GPMSA is an experimental capability and not ready for production use at this time.

Central to GPMSA is the construction of a Gaussian Process emulator from simulation runs collected at various settings of input parameters. The emulator is a statistical model of the system response, and it is used to incorporate the observational data to improve system predictions and constrain or calibrate the unknown parameters. The GPMSA code draws heavily on the theory developed in the seminal Bayesian calibration paper by Kennedy and O’Hagan [KOHagan01]. The particular approach in GPMSA was developed by the Los Alamos National Labortory statistics group and documented in [HGWR08]. Dakota’s GPMSA capability comes from the QUESO package developed at UT Austin.

Usage Tips:

Configuring GPMSA essentially involves identifying the simulation build data, the experiment data, the calibration and configuration (state) variables, and any necessary algorithm controls. The GP surrogate model is automatically constructed internal to the algorithm and need not be specified through Dakota input.

Dakota’s GPMSA implementation is not intended for production use. There are a number of known limitations, including:

  • Only works for scalar and multivariate responses, not field responses. Field responses will be treated as a single multi-variate response set. Consequently, simulation and experiment data must have the same dimensions.

  • When build data is not imported a design of experiments will be conducted over all calibration and scenario variables present.

  • Experiment data is required (one cannot pose the simulation data as a set of residuals with the assumption of 0-valued experiments).

  • Output and diagnostics are limited. Advanced users will need to examine QUESO output files (potentially written in a transformed scaled space) in the QuesoDiagnostic directory

Examples

The following input file fragment illustrates GPMSA-based Bayesian calibration of 3 \(\beta\) variables with a uniform prior, with 3 configuration (scenario) variables \(x\) . A total of 60 simulation build points are provided in <tt>sim_data.dat</tt>, which contains columns for each \(\beta\) , followed by each \(x\) , and then the simulation response ‘lin’. Each row of the experiment data file <tt>y_exp_with_var.dat</tt> contains the values of the 3 \(x\) variables, followed by the value of ‘lin’ and its observation error (variance).

method
  bayes_calibration gpmsa
    chain_samples 1000
    seed 2460
    build_samples 60
    import_build_points_file  'sim_data.dat'  freeform
    export_chain_points_file  'posterior.dat'
      burn_in_samples = 100  sub_sampling_period = 2
    posterior_stats kl

variables
  uniform_uncertain  3
    upper_bounds   0.4500   -0.1000    0.4000
    initial_point  0.2750   -0.3000    0.1000
    lower_bounds  -0.1000   -0.5000   -0.2000
    descriptors   'beta0'   'beta1'   'beta2'

  continuous_state  3
    upper_bounds   3 * 1.0
    initial_state  3 * 0.5
    lower_bounds   3 * 0.0
    descriptors    'x0'  'x1'  'x2'

responses
  descriptors 'lin'
  calibration_terms   1
  calibration_data_file 'y_exp_with_var.dat'
    freeform
    num_experiments  5
      num_config_variables   3
    experiment_variance_type 'scalar'
  no_gradients
  no_hessians