Efficient Global Optimization
Efficient Global Optimization (EGO) was developed to facilitate the unconstrained minimization of expensive implicit response functions. The method builds an initial Gaussian process model as a global surrogate for the response function, then intelligently selects additional samples to be added for inclusion in a new Gaussian process model in subsequent iterations. The new samples are selected based on how much they are expected to improve the current best solution to the optimization problem. When this expected improvement is acceptably small, the globally optimal solution has been found. The application of this methodology to equality-constrained reliability analysis is the primary contribution of EGRA.
Efficient global optimization was originally proposed by Jones et al. [JSW98] and has been adapted into similar methods such as sequential kriging optimization (SKO) [HANZ06]. The main difference between SKO and EGO lies within the specific formulation of what is known as the expected improvement function (EIF), which is the feature that sets all EGO/SKO-type methods apart from other global optimization methods. The EIF is used to select the location at which a new training point should be added to the Gaussian process model by maximizing the amount of improvement in the objective function that can be expected by adding that point. A point could be expected to produce an improvement in the objective function if its predicted value is better than the current best solution, or if the uncertainty in its prediction is such that the probability of it producing a better solution is high. Because the uncertainty is higher in regions of the design space with fewer observations, this provides a balance between exploiting areas of the design space that predict good solutions, and exploring areas where more information is needed.
The general procedure of these EGO-type methods is:
Build an initial Gaussian process model of the objective function.
Find the point that maximizes the EIF. If the EIF value at this point is sufficiently small, stop.
Evaluate the objective function at the point where the EIF is maximized. Update the Gaussian process model using this new point. Go to Step 2.
The following sections discuss the construction of the Gaussian process model used, the form of the EIF, and then a description of how that EIF is modified for application to reliability analysis.
Gaussian Process Model
Gaussian process (GP) models are set apart from other surrogate models because they provide not just a predicted value at an unsampled point, but also an estimate of the prediction variance. This variance gives an indication of the uncertainty in the GP model, which results from the construction of the covariance function. This function is based on the idea that when input points are near one another, the correlation between their corresponding outputs will be high. As a result, the uncertainty associated with the model’s predictions will be small for input points which are near the points used to train the model, and will increase as one moves further from the training points.
It is assumed that the true response function being modeled
where
where
where
The expected value
where
The parameters
where
Maximizing Eq. (261) gives the maximum likelihood
estimate of
Acquisition Functions
The acquisition function determines the location of the next sampling point or refinement points, in the sense that maximizing the acquisition function yields the next sampling point, as
Expected Improvement
The expected improvement function is used to select the location at
which a new training point should be added. The EIF is defined as the
expectation that any point in the search space will provide a better
solution than the current best solution based on the expected values and
variances predicted by the GP model. An important feature of the EIF is
that it provides a balance between exploiting areas of the design space
where good solutions have been found, and exploring areas of the design
space where the uncertainty is high. First, recognize that at any point
in the design space, the GP prediction
where the mean
where
where
where it is understood that
where
The point at which the EIF is maximized is selected as an additional training point. With the new training point added, a new GP model is built and then used to construct another EIF, which is then used to choose another new training point, and so on, until the value of the EIF at its maximized point is below some specified tolerance. In Ref. [HANZ06] this maximization is performed using a Nelder-Mead simplex approach, which is a local optimization method. Because the EIF is often highly multimodal [JSW98] it is expected that Nelder-Mead may fail to converge to the true global optimum. In Ref. [JSW98], a branch-and-bound technique for maximizing the EIF is used, but was found to often be too expensive to run to convergence. In Dakota, an implementation of the DIRECT global optimization algorithm is used [Gab01].
It is important to understand how the use of this EIF leads to optimal
solutions. Eq. (263) indicates how much the objective
function value at
The application of EGO to reliability analysis, however, is made more complicated due to the inclusion of equality constraints (see Eqs. (58)- (59)). For inverse reliability analysis, this extra complication is small. The response being modeled by the GP is the objective function of the optimization problem (see Eq. (59)) and the deterministic constraint might be handled through the use of a merit function, thereby allowing EGO to solve this equality-constrained optimization problem. Here the problem lies in the interpretation of the constraint for multimodal problems as mentioned previously. In the forward reliability case, the response function appears in the constraint rather than the objective. Here, the maximization of the EIF is inappropriate because feasibility is the main concern. This application is therefore a significant departure from the original objective of EGO and requires a new formulation. For this problem, the expected feasibility function is introduced.
Probability Improvement Acquisition Function
The probability of improvement (PI) acquisition function is proposed by [Kus64], using the same argument that the GP prediction is a Gaussian distribution. Similar to Equation (264), the PI acquisition function is given by
Generally speaking, the EI acquisition function performs better than the PI acquisition function.
Lower-Confidence Bound Acquisition Function
Another form of acquisition is lower-confidence bound (LCB), proposed recently by Srinivas et al. [SKKS09, SKKS12], which has shown to perform very well. The LCB acquisition function takes the form of
where
and
Batch-sequential parallel
The batch-sequential parallelization is mainly motivated by exploiting
the computational resource, where multiple sampling points
The approach by Desautels et al.
[DKB14] mainly advocates for the
“hallucination” scheme or heuristic liar, in which the unknown
observation at the currently querying sampling point
The asynchronous batch parallel EGO is implemented based on the idea of further leveraging computational efficiency when the computational query cost varies widely. In this scenario, the batch-sequential parallel EGO finishes one iteration when the last worker of the batch finishes. This mechanism makes the other workers, which might have finished the jobs or simulations earlier, wait for the last worker to finish, thus creating an unnecessary idle period. The asynchronous batch parallel scheme is, therefore, created to accelerate the optimization process by immediately assigning the next jobs to workers that have finished earlier jobs, without waiting for each other. When workers finish one query, the objective GP is updated, and the next sampling point is found by maximizing the acquisition function. Numerical comparison results are shown in one of our previous works [TEW+22], across a number of numerical functions and some engineering simulations as well.