neural_network

Artificial neural network model

Specification

Alias: None
Arguments: None

Child Keywords:

Required/Optional	Dakota Keyword	Dakota Keyword Description
Optional	max_nodes	Maximum number of hidden layer nodes
Optional	range	Range for neural network random weights
Optional	random_weight	(Inactive) Random weight control
Optional	export_model	Exports surrogate model in user-specified format(s)
Optional	import_model	Import surrogate model from archive file

Description

Dakota’s artificial neural network surrogate is a stochastic layered perceptron network, with a single hidden layer. Weights for the input layer are chosen randomly, while those in the hidden layer are estimated from data using a variant of the Zimmerman direct training approach [Zim96].

This typically yields lower training cost than traditional neural networks, yet good out-of-sample performance. This is helpful in surrogate-based optimization and optimization under uncertainty, where multiple surrogates may be repeatedly constructed during the optimization process, e.g., a surrogate per response function, and a new surrogate for each optimization iteration.

The neural network is a non parametric surface fitting method. Thus, along with Kriging (Gaussian Process) and MARS, it can be used to model data trends that have slope discontinuities as well as multiple maxima and minima. However, unlike Kriging, the neural network surrogate is not guaranteed to interpolate the data from which it was constructed.

This surrogate can be constructed from fewer than \(n_{c_{quad}}\) data points, however, it is a good rule of thumb to use at least \(n_{c_{quad}}\) data points when possible.

Known Issue: When using discrete variables, there have been sometimes significant differences in surrogate behavior observed across computing platforms in some cases. The cause has not yet been fully diagnosed and is currently under investigation. In addition, guidance on appropriate construction and use of surrogates with discrete variables is under development. In the meantime, users should therefore be aware that there is a risk of inaccurate results when using surrogates with discrete variables.

Theory

The form of the neural network model is

\[\hat{f}(\mathbf{x}) \approx \tanh\left\{ \mathbf{A}_{1} \tanh\left( \mathbf{A}_{0}^{T} \mathbf{x} +\theta_{0}^T \right)+\theta_{1} \right\}\]

where \(\mathbf{x}\) is the evaluation point in \(n\) -dimensional parameter space; the terms \(\mathbf{A}_{0}, \theta_{0}\) are the random input layer weight matrix and bias vector, respectively; and \(\mathbf{A}_{1}, \theta_{1}\) are a weight vector and bias scalar, respectively, estimated from training data. These coefficients are analogous to the polynomial coefficients obtained from regression to training data. The neural network uses a cross validation-based orthogonal matching pursuit solver to determine the optimal number of nodes and to solve for the weights and offsets.

neural_network

Exceptional service in the national interest