Active Subspace Models

The idea behind active subspaces is to find directions in the input variable space in which the quantity of interest is nearly constant. After rotation of the input variables, this method can allow significant dimension reduction. Below is a brief summary of the process.

  1. Compute the gradient of the quantity of interest, \(q = f(\mathbf{x})\), at several locations sampled from the full input space,

    \[\nabla_{\mathbf{x}} f_i = \nabla f(\mathbf{x}_i).\]
  2. Compute the eigendecomposition of the matrix \(\hat{\mathbf{C}}\),

    \[\hat{\mathbf{C}} = \frac{1}{M}\sum_{i=1}^{M}\nabla_{\mathbf{x}} f_i\nabla_{\mathbf{x}} f_i^T = \hat{\mathbf{W}}\hat{\mathbf{\Lambda}}\hat{\mathbf{W}}^T,\]

    where \(\hat{\mathbf{W}}\) has eigenvectors as columns, \(\hat{\mathbf{\Lambda}} = \text{diag}(\hat{\lambda}_1,\:\ldots\:,\hat{\lambda}_N)\) contains eigenvalues, and \(N\) is the total number of parameters.

  3. Using a truncation method or specifying a dimension to estimate the active subspace size, split the eigenvectors into active and inactive directions,

    \[\hat{\mathbf{W}} = \left[\hat{\mathbf{W}}_1\quad\hat{\mathbf{W}}_2\right].\]

    These eigenvectors are used to rotate the input variables.

  4. Next the input variables, \(\mathbf{x}\), are expanded in terms of active and inactive variables,

    \[\mathbf{x} = \hat{\mathbf{W}}_1\mathbf{y} + \hat{\mathbf{W}}_2\mathbf{z}.\]
  5. A surrogate is then built as a function of the active variables,

    \[g(\mathbf{y}) \approx f(\mathbf{x})\]

As a concrete example, consider the function: [Con15]

\[f(x) = \exp\left(0.7x_1 + 0.3x_2\right).\]

Figure [fig:activesubspace](a) is a contour plot of \(f(x)\). The black arrows indicate the eigenvectors of the matrix \(\hat{\mathbf{C}}\). Figure [fig:activesubspace](b) is the same function but rotated so that the axes are aligned with the eigenvectors. We arbitrarily give these rotated axes the labels \(y_1\) and \(y_2\). From fig. [fig:activesubspace](b) it is clear that all of the variation is along \(y_1\) and the dimension of the rotated input space can be reduced to 1.

TODO: Missing images, figure, caption here

2

For additional information, see references [Con15, CDW14, CG14].

Truncation Methods

Once the eigenvectors of \(\hat{\mathbf{C}}\) are obtained we must decide how many directions to keep. If the exact subspace size is known a priori it can be specified. Otherwise there are three automatic active subspace detection and truncation methods implemented:

  • Constantine metric (default),

  • Bing Li metric,

  • and Energy metric.

Constantine metric

The Constantine metric uses a criterion based on the variability of the subspace estimate. Eigenvectors are computed for bootstrap samples of the gradient matrix. The subspace size associated with the minimum distance between bootstrap eigenvectors and the nominal eigenvectors is the estimated active subspace size.

Below is a brief outline of the Constantine method of active subspace identification. The first two steps are common to all active subspace truncation methods.

  1. Compute the gradient of the quantity of interest, \(q = f(\mathbf{x})\), at several locations sampled from the input space,

    \[\nabla_{\mathbf{x}} f_i = \nabla f(\mathbf{x}_i).\]
  2. Compute the eigendecomposition of the matrix \(\hat{\mathbf{C}}\),

    \[\hat{\mathbf{C}} = \frac{1}{M}\sum_{i=1}^{M}\nabla_{\mathbf{x}} f_i\nabla_{\mathbf{x}} f_i^T = \hat{\mathbf{W}}\hat{\mathbf{\Lambda}}\hat{\mathbf{W}}^T,\]

    where \(\hat{\mathbf{W}}\) has eigenvectors as columns, \(\hat{\mathbf{\Lambda}} = \text{diag}(\hat{\lambda}_1,\:\ldots\:,\hat{\lambda}_N)\) contains eigenvalues, and \(N\) is the total number of parameters.

  3. Use bootstrap sampling of the gradients found in step 1 to compute replicate eigendecompositions,

    \[\hat{\mathbf{C}}_j^* = \hat{\mathbf{W}}_j^*\hat{\mathbf{\Lambda}}_j^*\left(\hat{\mathbf{W}}_j^*\right)^T.\]
  4. Compute the average distance between nominal and bootstrap subspaces,

    \[e^*_n = \frac{1}{M_{boot}}\sum_j^{M_{boot}} \text{dist}(\text{ran}(\hat{\mathbf{W}}_n), \text{ran}(\hat{\mathbf{W}}_{j,n}^*)) = \frac{1}{M_{boot}}\sum_j^{M_{boot}} \left\| \hat{\mathbf{W}}_n\hat{\mathbf{W}}_n^T - \hat{\mathbf{W}}_{j,n}^*\left(\hat{\mathbf{W}}_{j,n}^*\right)^T\right\|,\]

    where \(M_{boot}\) is the number of bootstrap samples, \(\hat{\mathbf{W}}_n\) and \(\hat{\mathbf{W}}_{j,n}^*\) both contain only the first \(n\) eigenvectors, and \(n < N\).

  5. The estimated subspace rank, \(r\), is then,

    \[r = \operatorname*{arg\,min}_n \, e^*_n.\]

For additional information, see Ref. [Con15].

Bing Li metric

The Bing Li metric uses a trade-off criterion to determine where to truncate the active subspace. The criterion is a function of the eigenvalues and eigenvectors of the active subspace gradient matrix. This function compares the decrease in eigenvalue amplitude with the increase in eigenvector variability under bootstrap sampling of the gradient matrix. The active subspace size is taken to be the index of the first minimum of this quantity.

Below is a brief outline of the Bing Li method of active subspace identification. The first two steps are common to all active subspace truncation methods.

  1. Compute the gradient of the quantity of interest, \(q = f(\mathbf{x})\), at several locations sampled from the input space,

    \[\nabla_{\mathbf{x}} f_i = \nabla f(\mathbf{x}_i).\]
  2. Compute the eigendecomposition of the matrix \(\hat{\mathbf{C}}\),

    \[\hat{\mathbf{C}} = \frac{1}{M}\sum_{i=1}^{M}\nabla_{\mathbf{x}} f_i\nabla_{\mathbf{x}} f_i^T = \hat{\mathbf{W}}\hat{\mathbf{\Lambda}}\hat{\mathbf{W}}^T,\]

    where \(\hat{\mathbf{W}}\) has eigenvectors as columns, \(\hat{\mathbf{\Lambda}} = \text{diag}(\hat{\lambda}_1,\:\ldots\:,\hat{\lambda}_N)\) contains eigenvalues, and \(N\) is the total number of parameters.

  3. Normalize the eigenvalues,

    \[\lambda_i = \frac{\hat{\lambda}_i}{\sum_j^N \hat{\lambda}_j}.\]
  4. Use bootstrap sampling of the gradients found in step 1 to compute replicate eigendecompositions,

    \[\hat{\mathbf{C}}_j^* = \hat{\mathbf{W}}_j^*\hat{\mathbf{\Lambda}}_j^*\left(\hat{\mathbf{W}}_j^*\right)^T.\]
  5. Compute variability of eigenvectors,

    \[f_i^0 = \frac{1}{M_{boot}}\sum_j^{M_{boot}}\left\lbrace 1 - \left\vert\text{det}\left(\hat{\mathbf{W}}_i^T\hat{\mathbf{W}}_{j,i}^*\right)\right\vert\right\rbrace ,\]

    where \(\hat{\mathbf{W}}_i\) and \(\hat{\mathbf{W}}_{j,i}^*\) both contain only the first \(i\) eigenvectors and \(M_{boot}\) is the number of bootstrap samples. The value of the variability at the first index, \(f_1^0\), is defined as zero.

  6. Normalize the eigenvector variability,

    \[f_i = \frac{f_i^0}{\sum_j^N f_j^0}.\]
  7. The criterion, \(g_i\), is defined as,

    \[g_i = \lambda_i + f_i.\]
  8. The index of first minimum of \(g_i\) is then the estimated active subspace rank.

For additional information, see Ref. [LL15].

Energy metric

The energy metric truncation method uses a criterion based on the derivative matrix eigenvalue energy. The user can specify the maximum percentage (as a decimal) of the eigenvalue energy that is not captured by the active subspace represenation.

Using the eigenvalue energy truncation metric, the subspace size is determined using the following equation:

\[n = \inf \left\lbrace d \in \mathbb{Z} \quad\middle|\quad 1 \le d \le N \quad \wedge\quad 1 - \frac{\sum_{i = 1}^{d} \lambda_i}{\sum_{i = 1}^{N} \lambda_i} \,<\, \epsilon \right\rbrace\]

where \(\epsilon\) is the truncation_tolerance, \(n\) is the estimated subspace size, \(N\) is the size of the full space, and \(\lambda_i\) are the eigenvalues of the derivative matrix.