Design of Experiments

Overview

Classical design of experiments (DoE) methods and the more modern design and analysis of computer experiments (DACE) methods are both techniques which seek to extract as much trend data from a parameter space as possible using a limited number of sample points. Classical DoE techniques arose from technical disciplines that assumed some randomness and nonrepeatability in field experiments (e.g., agricultural yield, experimental chemistry). DoE approaches such as central composite design, Box-Behnken design, and full and fractional factorial design generally put sample points at the extremes of the parameter space, since these designs offer more reliable trend extraction in the presence of nonrepeatability. DACE methods are distinguished from DoE methods in that the nonrepeatability component can be omitted since computer simulations are involved. In these cases, space filling designs such as orthogonal array sampling and Latin hypercube sampling are more commonly employed in order to accurately extract trend information. Quasi-Monte Carlo sampling techniques which are constructed to fill the unit hypercube with good uniformity of coverage can also be used for DACE.

Dakota supports both DoE and DACE techniques. In common usage, only parameter bounds are used in selecting the samples within the parameter space. Thus, DoE and DACE can be viewed as special cases of the more general probabilistic sampling for uncertainty quantification (see following section), in which the DoE/DACE parameters are treated as having uniform probability distributions. The DoE/DACE techniques are commonly used for investigation of global response trends, identification of significant parameters (e.g., main effects), and as data generation methods for building response surface approximations.

Dakota includes several approaches sampling and design of experiments, all implemented in included third-party software libraries. LHS (Latin hypercube sampling) [SW04] is a general-purpose sampling package developed at Sandia that has been used by the DOE national labs for several decades. DDACE (distributed design and analysis for computer experiments) is a more recent package for computer experiments developed at Sandia Labs [TM]. DDACE provides the capability for generating orthogonal arrays, Box-Behnken designs, Central Composite designs, and random designs. The FSUDace (Florida State University’s Design and Analysis of Computer Experiments) package provides the following sampling techniques: quasi-Monte Carlo sampling based on Halton or Hammersley sequences, and Centroidal Voronoi Tessellation. Lawrence Livermore National Lab’s PSUADE (Problem Solving Environment for Uncertainty Analysis and Design Exploration) [Ton05] includes several methods for model exploration, but only the Morris screening method is exposed in Dakota.

This chapter describes DDACE, FSUDace, and PSUADE, with a focus on designing computer experiments. Latin Hypercube Sampling, also used in uncertainty quantification, is discussed in the section on sampling methods.

Design of Computer Experiments

What distinguishes design of computer experiments? Computer experiments are often different from physical experiments, such as those performed in agriculture, manufacturing, or biology. In physical experiments, one often applies the same treatment or factor level in an experiment several times to get an understanding of the variability of the output when that treatment is applied. For example, in an agricultural experiment, several fields (e.g., 8) may be subject to a low level of fertilizer and the same number of fields may be subject to a high level of fertilizer to see if the amount of fertilizer has a significant effect on crop output. In addition, one is often interested in the variability of the output within a treatment group: is the variability of the crop yields in the low fertilizer group much higher than that in the high fertilizer group, or not?

In physical experiments, the process we are trying to examine is stochastic: that is, the same treatment may result in different outcomes. By contrast, in computer experiments, often we have a deterministic code. If we run the code with a particular set of input parameters, the code will always produce the same output. There certainly are stochastic codes, but the main focus of computer experimentation has been on deterministic codes. Thus, in computer experiments we often do not have the need to do replicates (running the code with the exact same input parameters several times to see differences in outputs). Instead, a major concern in computer experiments is to create an experimental design which can sample a high-dimensional space in a representative way with a minimum number of samples. The number of factors or parameters that we wish to explore in computer experiments is usually much higher than physical experiments. In physical experiments, one may be interested in varying a few parameters, usually five or less, while in computer experiments we often have dozens of parameters of interest. Choosing the levels of these parameters so that the samples adequately explore the input space is a challenging problem. There are many experimental designs and sampling methods which address the issue of adequate and representative sample selection.

There are many goals of running a computer experiment: one may want to explore the input domain or the design space and get a better understanding of the range in the outputs for a particular domain. Another objective is to determine which inputs have the most influence on the output, or how changes in the inputs change the output. This is usually called sensitivity analysis.

Another goal is to use the sampled input points and their corresponding output to create a response surface approximation for the computer code. The response surface approximation (e.g., a polynomial regression model, a Gaussian-process/Kriging model, a neural net) can then be used to emulate the computer code. Constructing a response surface approximation is particularly important for applications where running a computational model is extremely expensive: the computer model may take 10 or 20 hours to run on a high performance machine, whereas the response surface model may only take a few seconds. Thus, one often optimizes the response surface model or uses it within a framework such as surrogate-based optimization. Response surface models are also valuable in cases where the gradient (first derivative) and/or Hessian (second derivative) information required by optimization techniques are either not available, expensive to compute, or inaccurate because the derivatives are poorly approximated or the function evaluation is itself noisy due to roundoff errors. Furthermore, many optimization methods require a good initial point to ensure fast convergence or to converge to good solutions (e.g. for problems with multiple local minima). Under these circumstances, a good design of computer experiment framework coupled with response surface approximations can offer great advantages.

In addition to the sensitivity analysis and response surface modeling mentioned above, we also may want to do uncertainty quantification on a computer model. Uncertainty quantification (UQ) refers to taking a particular set of distributions on the inputs, and propagating them through the model to obtain a distribution on the outputs. For example, if input parameter A follows a normal distribution with mean 5 and variance 1, the computer produces a random draw from that distribution. If input parameter B follows a weibull distribution with alpha = 0.5 and beta = 1, the computer produces a random draw from that distribution. When all of the uncertain variables have samples drawn from their input distributions, we run the model with the sampled values as inputs. We do this repeatedly to build up a distribution of outputs. We can then use the cumulative distribution function of the output to ask questions such as: what is the probability that the output is greater than 10? What is the 99th percentile of the output?

Note that sampling-based uncertainty quantification and design of computer experiments are very similar. There is significant overlap in the purpose and methods used for UQ and for DACE. We have attempted to delineate the differences within Dakota as follows: we use the methods DDACE, FSUDACE, and PSUADE primarily for design of experiments, where we are interested in understanding the main effects of parameters and where we want to sample over an input domain to obtain values for constructing a response surface. We use the nondeterministic sampling methods (sampling) for uncertainty quantification, where we are propagating specific input distributions and interested in obtaining (for example) a cumulative distribution function on the output. If one has a problem with no distributional information, we recommend starting with a design of experiments approach. Note that DDACE, FSUDACE, and PSUADE currently do not support distributional information: they take an upper and lower bound for each uncertain input variable and sample within that. The uncertainty quantification methods in sampling (primarily Latin Hypercube sampling) offer the capability to sample from many distributional types. The distinction between UQ and DACE is somewhat arbitrary: both approaches often can yield insight about important parameters and both can determine sample points for response surface approximations.

Three software packages are available in Dakota for design of computer experiments, DDACE (developed at Sandia Labs), FSUDACE (developed at Florida State University), and PSUADE (LLNL).

DDACE

The Distributed Design and Analysis of Computer Experiments (DDACE) package includes both classical design of experiments methods [TM] and stochastic sampling methods. The classical design of experiments methods in DDACE are central composite design (CCD) and Box-Behnken (BB) sampling. A grid-based sampling (full-factorial) method is also available. The stochastic methods are orthogonal array sampling [KO96] (which permits main effects calculations), Monte Carlo (random) sampling, Latin hypercube sampling, and orthogonal array-Latin hypercube sampling. While DDACE LHS supports variables with normal or uniform distributions, only uniform are supported through Dakota. Also DDACE does not allow enforcement of user-specified correlation structure among the variables.

The sampling methods in DDACE can be used alone or in conjunction with other methods. For example, DDACE sampling can be used with both surrogate-based optimization and optimization under uncertainty advanced methods. See Listing 73 for an example of how the DDACE settings are used in Dakota.

The following sections provide more detail about the sampling methods available for design of experiments in DDACE.

Central Composite Design

A Box-Wilson Central Composite Design, commonly called a central composite design (CCD), contains an embedded factorial or fractional factorial design with center points that is augmented with a group of ’star points’ that allow estimation of curvature. If the distance from the center of the design space to a factorial point is \(\pm\)1 unit for each factor, the distance from the center of the design space to a star point is \(\pm\alpha\) with \(\mid\alpha\mid > 1\). The precise value of \(\alpha\) depends on certain properties desired for the design and on the number of factors involved. The CCD design is specified in Dakota with the method command dace central_composite.

As an example, with two input variables or factors, each having two levels, the factorial design is shown in Table 10:

Table 10 Simple Factorial Design

Input 1

Input 2

-1

-1

-1

+1

+1

-1

+1

+1

With a CCD, the design in Table 10 would be augmented with the points shown in Table 11, if \(\alpha\) = 1.3.

Table 11 Additional Points to make the factorial design a CCD

Input 1

Input 2

0

+1.3

0

-1.3

1.3

0

-1.3

0

0

0

These points define a circle around the original factorial design. Note that the number of sample points specified in a CCD, samples, is a function of the number of variables in the problem:

\[samples = 1 + 2*NumVar + 2^{NumVar}\]

Box-Behnken Design

The Box-Behnken design is similar to a Central Composite design, with some differences. The Box-Behnken design is a quadratic design in that it does not contain an embedded factorial or fractional factorial design. In this design the treatment combinations are at the midpoints of edges of the process space and at the center, as compared with CCD designs where the extra points are placed at ’star points’ on a circle outside of the process space. Box-Behken designs are rotatable (or near rotatable) and require 3 levels of each factor. The designs have limited capability for orthogonal blocking compared to the central composite designs. Box-Behnken requires fewer runs than CCD for 3 factors, but this advantage goes away as the number of factors increases. The Box-Behnken design is specified in Dakota with the method command dace box_behnken.

Note that the number of sample points specified in a Box-Behnken design, samples, is a function of the number of variables in the problem:

\[samples = 1 + 4*NumVar + (NumVar-1)/2\]

Orthogonal Array Designs

Orthogonal array (OA) sampling is a widely used technique for running experiments and systematically testing factor effects [HSS99]. An orthogonal array sample can be described as a 4-tuple \((m,n,s,r)\), where \(m\) is the number of sample points, \(n\) is the number of input variables, \(s\) is the number of symbols, and \(r\) is the strength of the orthogonal array. The number of sample points, \(m\), must be a multiple of the number of symbols, \(s\). The number of symbols refers to the number of levels per input variable. The strength refers to the number of columns where we are guaranteed to see all the possibilities an equal number of times.

For example, Table 12 shows an orthogonal array of strength 2 for \(m\) = 8, with 7 variables:

Table 12 Orthogonal Array for Seven Variables

Input 1

Input 2

Input 3

Input 4

Input 5

Input 6

Input 7

0

0

0

0

0

0

0

0

0

0

1

1

1

1

0

1

1

0

0

1

1

0

1

1

1

1

0

0

1

0

1

0

1

0

1

1

0

1

1

0

1

0

1

1

0

0

1

1

0

1

1

0

1

0

0

1

If one picks any two columns, say the first and the third, note that each of the four possible rows we might see there, 0 0, 0 1, 1 0, 1 1, appears exactly the same number of times, twice in this case.

DDACE creates orthogonal arrays of strength 2. Further, the OAs generated by DDACE do not treat the factor levels as one fixed value (0 or 1 in the above example). Instead, once a level for a variable is determined in the array, DDACE samples a random variable from within that level. The orthogonal array design is specified in Dakota with the method command dace oas.

The orthogonal array method in DDACE is the only method that allows for the calculation of main effects, specified with the command main_effects. Main effects is a sensitivity analysis method which identifies the input variables that have the most influence on the output. In main effects, the idea is to look at the mean of the response function when variable A (for example) is at level 1 vs. when variable A is at level 2 or level 3. If these mean responses of the output are statistically significantly different at different levels of variable A, this is an indication that variable A has a significant effect on the response. The orthogonality of the columns is critical in performing main effects analysis, since the column orthogonality means that the effects of the other variables ’cancel out’ when looking at the overall effect from one variable at its different levels. There are ways of developing orthogonal arrays to calculate higher order interactions, such as two-way interactions (what is the influence of Variable A * Variable B on the output?), but this is not available in DDACE currently. At present, one way interactions are supported in the calculation of orthogonal array main effects within DDACE. The main effects are presented as a series of ANOVA tables. For each objective function and constraint, the decomposition of variance of that objective or constraint is presented as a function of the input variables. The p-value in the ANOVA table is used to indicate if the input factor is significant. The p-value is the probability that you would have obtained samples more extreme than you did if the input factor has no effect on the response. For example, if you set a level of significance at 0.05 for your p-value, and the actual p-value is 0.03, then the input factor has a significant effect on the response.

Grid Design

In a grid design, a grid is placed over the input variable space. This is very similar to a multi-dimensional parameter study where the samples are taken over a set of partitions on each variable. The main difference is that in grid sampling, a small random perturbation is added to each sample value so that the grid points are not on a perfect grid. This is done to help capture certain features in the output such as periodic functions. A purely structured grid, with the samples exactly on the grid points, has the disadvantage of not being able to capture important features such as periodic functions with relatively high frequency (due to aliasing). Adding a random perturbation to the grid samples helps remedy this problem.

Another disadvantage with grid sampling is that the number of sample points required depends exponentially on the input dimensions. In grid sampling, the number of samples is the number of symbols (grid partitions) raised to the number of variables. For example, if there are 2 variables, each with 5 partitions, the number of samples would be \(5^2\). In this case, doubling the number of variables squares the sample size. The grid design is specified in Dakota with the method command dace grid.

Note

Refer to the section on multi-dimensional parameter studies for more information.

Monte Carlo Design

Monte Carlo designs simply involve pure Monte-Carlo random sampling from uniform distributions between the lower and upper bounds on each of the input variables. Monte Carlo designs, specified by dace random, are a way to generate a set of random samples over an input domain.

LHS Design

DDACE offers the capability to generate Latin Hypercube designs. Note that the version of LHS in DDACE generates uniform samples (uniform between the variable bounds). The version of LHS offered with nondeterministic sampling can generate LHS samples according to a number of distribution types, including normal, lognormal, weibull, beta, etc. To specify the DDACE version of LHS, use the method command dace lhs.

Note

Refer to the section on Latin Hypercube sampling for more information.

OA-LHS Design

DDACE offers a hybrid design which is combination of an orthogonal array and a Latin Hypercube sample. This design is specified with the method command dace oa_lhs. This design has the advantages of both orthogonality of the inputs as well as stratification of the samples (see [Owe92]).

FSUDace

The Florida State University Design and Analysis of Computer Experiments (FSUDace) package provides quasi-Monte Carlo sampling (Halton and Hammersley) and Centroidal Voronoi Tessellation (CVT) methods. All three methods natively generate sets of uniform random variables on the interval \([0,1]\) (or in Dakota, on user-specified uniform intervals).

The quasi-Monte Carlo and CVT methods are designed with the goal of low discrepancy. Discrepancy refers to the nonuniformity of the sample points within the unit hypercube. Low discrepancy sequences tend to cover the unit hypercube reasonably uniformly. Quasi-Monte Carlo methods produce low discrepancy sequences, especially if one is interested in the uniformity of projections of the point sets onto lower dimensional faces of the hypercube (usually 1-D: how well do the marginal distributions approximate a uniform?) CVT does very well volumetrically: it spaces the points fairly equally throughout the space, so that the points cover the region and are isotropically distributed with no directional bias in the point placement. There are various measures of volumetric uniformity which take into account the distances between pairs of points, regularity measures, etc. Note that CVT does not produce low-discrepancy sequences in lower dimensions, however: the lower-dimension (such as 1-D) projections of CVT can have high discrepancy.

The quasi-Monte Carlo sequences of Halton and Hammersley are deterministic sequences determined by a set of prime bases. A Halton design is specified in Dakota with the method command fsu_quasi_mc halton, and the Hammersley design is specified with the command fsu_quasi_mc hammersley. For more details about the input specification, see the Reference Manual. CVT points tend to arrange themselves in a pattern of cells that are roughly the same shape. To produce CVT points, an almost arbitrary set of initial points is chosen, and then an internal set of iterations is carried out. These iterations repeatedly replace the current set of sample points by an estimate of the centroids of the corresponding Voronoi subregions [DFG99]. A CVT design is specified in Dakota with the method command fsu_cvt.

The methods in FSUDace are useful for design of experiments because they provide good coverage of the input space, thus allowing global sensitivity analysis.

PSUADE MOAT

PSUADE (Problem Solving Environment for Uncertainty Analysis and Design Exploration) is a Lawrence Livermore National Laboratory tool for metamodeling, sensitivity analysis, uncertainty quantification, and optimization. Its features include non-intrusive and parallel function evaluations, sampling and analysis methods, an integrated design and analysis framework, global optimization, numerical integration, response surfaces (MARS and higher order regressions), graphical output with Pgplot or Matlab, and fault tolerance [Ton05]. Dakota includes a prototype interface to its Morris One-At-A-Time (MOAT) screening method, a valuable tool for global sensitivity (including interaction) analysis.

The Morris One-At-A-Time method, originally proposed by M. D. Morris [Mor91], is a screening method, designed to explore a computational model to distinguish between input variables that have negligible, linear and additive, or nonlinear or interaction effects on the output. The computer experiments performed consist of individually randomized designs which vary one input factor at a time to create a sample of its elementary effects.

With MOAT, each dimension of a \(k-\)dimensional input space is uniformly partitioned into \(p\) levels, creating a grid of \(p^k\) points \({\bf x} \in \mathbb{R}^k\) at which evaluations of the model \(y({\bf x})\) might take place. An elementary effect corresponding to input \(i\) is computed by a forward difference

\[d_i({\bf x}) = \frac{y({\bf x} + \Delta {\bf e}_i) - y({\bf x})}{\Delta},\]

where \(e_i\) is the \(i^{\mbox{th}}\) coordinate vector, and the step \(\Delta\) is typically taken to be large (this is not intended to be a local derivative approximation). In the present implementation of MOAT, for an input variable scaled to \([0,1]\), \(\Delta = \frac{p}{2(p-1)}\), so the step used to find elementary effects is slightly larger than half the input range.

The distribution of elementary effects \(d_i\) over the input space characterizes the effect of input \(i\) on the output of interest. After generating \(r\) samples from this distribution, their mean,

\[\mu_i = \frac{1}{r}\sum_{j=1}^{r}{d_i^{(j)}},\]

modified mean

\[\mu_i^* = \frac{1}{r}\sum_{j=1}^{r}{|d_i^{(j)}|},\]

(using absolute value) and standard deviation

\[\sigma_i = \sqrt{ \frac{1}{r}\sum_{j=1}^{r}{ \left(d_i^{(j)} - \mu_i \right)^2} }\]

are computed for each input \(i\). The mean and modified mean give an indication of the overall effect of an input on the output. Standard deviation indicates nonlinear effects or interactions, since it is an indicator of elementary effects varying throughout the input space.

The MOAT method is selected with method keyword psuade_moat as shown in the sample Dakota input file Listing 63.

Listing 63 Dakota input file showing the Morris One-at-a-Time method – see dakota/share/dakota/examples/users/morris_ps_moat.in
# Dakota Input File: morris_ps_moat.in

environment
  tabular_data
    tabular_data_file 'dakota_psuade.0.dat'

method
  psuade_moat
    samples = 84
    partitions = 3
    seed = 500

variables
  continuous_design = 20
    lower_bounds = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    upper_bounds = 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
                   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

interface
  analysis_drivers = 'morris'
    fork
  asynchronous evaluation_concurrency = 5

responses
  objective_functions = 1
  no_gradients
  no_hessians

The number of samples must be a positive integer multiple of (number of continuous design variables \(k\) + 1) and will be automatically adjusted if misspecified. The number of partitions applies to each variable being studied and must be odd (the number of MOAT levels \(p\) per variable is partitions + 1, similar to Dakota multidimensional parameter studies). This will also be adjusted at runtime as necessary. Finite user-specified lower and upper bounds are required and will be scaled as needed by the method.

Note

For more information on the use of MOAT sampling, see this Morris example, or Saltelli, et al. [STCR04].

Sensitivity Analysis

Sensitivity Analysis Overview

In many engineering design applications, sensitivity analysis techniques and parameter study methods are useful in identifying which of the design parameters have the most influence on the response quantities.

Assessing Sensitivity with DACE

The DACE techniques are useful for characterizing the behavior of the response functions of interest through the parameter ranges of interest. In addition to direct interrogation and visualization of the sampling results, a number of techniques have been developed for assessing the parameters which are most influential in the observed variability in the response functions. The range of sensitivity analysis methods supported by Dakota are described and compared in the Sensitivity Analysis documentation.

Correlation coefficients are computed by default for any LHS study. Their description and interpretation is provided in Correlation coefficients.

It is also possible to compute Sobol’ indices through variance-based decomposition (VBD), which attributes fractions of output variance to inputs and their interactions. Discussion of computation and interpretation of these indices is provided in Sobol’ Indices/Variance-Based Decomposition. VBD can be specified for any of the sampling or DACE methods using the command variance_based_decomp.

DOE Usage Guidelines

Parameter studies, classical design of experiments (DOE), design/analysis of computer experiments (DACE), and sampling methods share the purpose of exploring the parameter space. When a global space-filling set of samples is desired, then the DOE, DACE, and sampling methods are recommended. These techniques are useful for scatter plot and variance analysis as well as surrogate model construction.

The distinction between DOE and DACE methods is that the former are intended for physical experiments containing an element of nonrepeatability (and therefore tend to place samples at the extreme parameter vertices), whereas the latter are intended for repeatable computer experiments and are more space-filling in nature.

The distinction between DOE/DACE and sampling is drawn based on the distributions of the parameters. DOE/DACE methods typically assume uniform distributions, whereas the sampling approaches in Dakota support a broad range of probability distributions.

To use sampling in design of experiments mode (as opposed to uncertainty quantification mode), an active view override (e.g., active all) can be included in the variables specification (see “Management of Mixed Variables by Iterator”) of the Dakota input file.

Design of experiments method selection recommendations are summarized in Table 13:

Table 13 Guidelines for selection of parameter study, DOE, DACE, and sampling methods.

Method Classification

Applications

Applicable Methods

parameter study

sensitivity analysis, directed parameter space investigations

centered_parameter_study, list_parameter_study, multidim_parameter_study, vector_parameter_study

classical design of experiments

physical experiments (parameters are uniformly distributed)

dace (box_behnken, central_composite)

design of computer experiments

variance analysis, space filling design (parameters are uniformly distributed)

dace (grid, random, oas lhs, oa_lhs), fsu_quasi_mc (halton, hammersley), fsu_cvt, psuade_moat

sampling

space filling designs (parameters have general probability distributions)

sampling (random or lhs) with optional active view override

Video Resources

Title

Link

Resources

Sensitivity Analysis

Sensitivity Analysis

Slides / Exercises

Introduction to Sensitivity Analysis

Introduction to Sensitivity Analysis