.. _sa: Sensitivity Analysis ==================== .. _`sa:overview`: Overview -------- Sensitivity analysis (SA) reveals the extent to which simulation outputs depend on each simulation input. The primary goal is to identify the most important input variables and their interactions, enabling analysts to focus resources on the parameters that matter most. This page summarizes SA concepts and terminology, describes :ref:`local ` and :ref:`global ` methods for SA as well as :ref:`parameter studies `, and offers :ref:`usage guidelines `. Sensitivity analysis serves several key purposes in computational modeling: - **Screening and ranking**: Identify the most influential variables to down-select for further uncertainty quantification or optimization studies. - **Resource allocation**: Focus data gathering, model development, code development, and uncertainty characterization efforts on the most impactful parameters. - **Model understanding**: Identify key model characteristics such as smoothness, nonlinear trends, and robustness, while developing intuition about model behavior. - **Quality assurance**: SA can reveal code and model issues as a side effect of systematic parameter exploration. - **Surrogate construction**: Data generated during SA studies can be repurposed to construct surrogate models for subsequent analyses. Sensitivity analysis methods can be broadly categorized by their scope (local vs. global) and by the metrics they produce for quantifying parameter influence. .. _`sa:local`: Local Sensitivity ----------------- Local sensitivity analysis examines how the response changes with respect to small perturbations around a single point in parameter space. The classic measure of local sensitivity is the partial derivative of the response with respect to each parameter: .. math:: \frac{\partial f}{\partial x_i} \bigg|_{x=x_0} Local sensitivity provides information about the slope of the response surface at the nominal point :math:`x_0`. This can be useful for understanding instantaneous rates of change, but may not capture the full picture if the response is highly nonlinear or if the parameter ranges of interest are large relative to the local curvature. Dakota supports this type of study through numerical finite-differencing or retrieval of analytic gradients computed within the analysis code. The desired gradient data is specified in the responses section of the Dakota input file and the collection of this data at a single point is accomplished through a parameter study method with no steps. This approach to sensitivity analysis should be distinguished from the activity of augmenting analysis codes to internally compute derivatives using techniques such as direct or adjoint differentiation, automatic differentiation, or complex step modifications. These sensitivity augmentation activities are completely separate from Dakota and are outside the scope of this manual. However, once completed, Dakota can utilize these analytic gradients to perform optimization, uncertainty quantification, and related studies more reliably and efficiently. .. _`sa:global`: Global Sensitivity ------------------ Global sensitivity analysis assesses the relative influence of parameters over the entire input space, either between upper and lower bounds or over the support of the parameters' probability distributions. Rather than examining behavior at a single point, global SA characterizes how parameters affect the response over the range of plausible parameter values. Global SA addresses questions such as: - What is the general trend of the response over all values of a parameter? - Does the response depend more nonlinearly on one factor than another? - How do parameters interact to influence the response? Global SA is performed by evaluating the response at well-distributed points in the input space and analyzing the resulting input/output pairs. Dakota primarily focuses on global sensitivity analysis methods. Various metrics and approaches can quantify parameter influence, each with different interpretations and computational requirements: .. _`sa:global:scatterplots`: Scatterplots ~~~~~~~~~~~~ Scatterplots are a simple, but highly effective, visualization-based sensitivity analysis technique performed using external software and the input-output samples generated by a Dakota study. To generate scatterplots, samples are projected down so that one output dimension is plotted against one parameter dimension, for each parameter and output in turn. Scatterplots with a uniformly distributed cloud of points indicate parameters with little influence on the results, while scatterplots with a defined shape to the cloud indicate parameters which are more significant. Scatterplots are a primary diagnostic tool for interpreting sensitivity analysis results. They can reveal: - Monotonic or non-monotonic relationships - Nonlinear trends (e.g., saturation, thresholds) - Interaction effects (e.g., changing output variation as a function of multiple inputs) .. figure:: img/scatterplots_only.png :alt: An example of scatterplots for a function exhibiting nonmonotonic, nonlinear relationships, and interactions. :name: sa:figure01 :align: center An example of scatterplots for a function exhibiting nonmonotonic, nonlinear relationships, and interactions. Since :math:`X_3` exhibits little to no trend, it may not be an influential parameter. Unlike quantitative sensitivity metrics, scatterplots: - Don't provide a scalar measure of importance or ranking - Are conditional on the sampled input space - Require interpretation However, they are essential for: - Validating assumptions behind sensitivity metrics (e.g., monotonicity for correlation) - Diagnosing misleading or ambiguous metric values - Identifying structure that may motivate surrogate modeling choices Scatterplots are most effective when used in conjunction with sampling-based methods (e.g., Latin hypercube sampling), where the input space is well covered. Limitations: - They can become difficult to interpret in high-dimensional problems - They don't isolate interaction effects without additional visualization .. _`sa:global:corr_coeffs`: Correlation coefficients ~~~~~~~~~~~~~~~~~~~~~~~~ Correlation coefficients, computed from sampling data, measure the strength of linear or monotonic relationships between two quantities. Correlations can be computed between input parameters, between output responses, and between inputs and outputs. They are inexpensive to compute from existing samples. As such, they are computed by default as part of the results of a :dakkw:`method-sampling` study. For the purposes of sensitivity analysis, this discussion focuses on correlations between inputs and outputs. The sample (Pearson) correlation coefficient measures the strength of the linear relationship between an input and an output, ignoring all other inputs. Their efficacy can break down as a sensitivity measure if the relationship is nonlinear. The rank (Spearman) correlation coefficient measures the strength of the monotonic relationship between an input and an output, ignoring all other inputs. For this reason, they are robust to nonlinear monotonic relationships, but can break down for non-monotonic cases. Partial correlation coefficients measure correlation between inputs and outputs, after removing linear effects of other inputs. In this sense, they control for other inputs in the computation of correlation between a given input and output. However, their accuracy can degrade when the sample size is small relative to the number of inputs (e.g., less than 10-20x the number of inputs). Correlation coefficients can struggle in the following contexts: - non-monotonic, nonlinear input-output relationships - correlated inputs - input-output relationships driven by interactions between inputs - noisy input-output relationships Below are examples, with scatterplots, showing how simple correlations behave in a range of scenarios. .. figure:: img/ccs_noise_effects.png :alt: For the same underlying function, increasing noise in the response decreases the computed correlation coefficient. The Pearson correlation coefficient is shown. :name: sa:figure02 :align: center For the same underlying function, increasing noise in the response decreases the computed correlation coefficient. The Pearson correlation coefficient :math:`\rho` is shown here. .. figure:: img/ccs_slope_effects.png :alt: Correlation coefficients for different variations of y=ax, where the coefficient is always -1 or 1, no matter what a is. :name: sa:figure03 :align: center Correlation coefficients aren't impacted by the magnitude of slope of a linear relationship; they are only influenced by the direction (upward or downward) and the strength of the linear relationship (how closely points fall on a line). .. figure:: img/ccs_edge_cases.png :alt: Pearson and Spearman correlation coefficients can diverge for monotonic, nonlinear or discontinuous functions. They both can fail to identify a significant relationship for non-monotonic relationships. :name: sa:figure04 :align: center Pearson and Spearman correlation coefficients (:math:`\rho` and :math:`\rho_s`, respectivey) can diverge for monotonic, nonlinear or discontinuous functions. They both can fail to identify a significant relationship for non-monotonic relationships. .. figure:: img/ccs_interaction_driven.png :alt: Correlation coefficients can fail to identify significant relationships for functions driven by interactions between inputs, as is shown in this example. :name: sa:figure05 :align: center Correlation coefficients can fail to identify significant relationships for functions driven by interactions between inputs, as is shown in this example. .. _`sa:global:std_regression_coeffs`: Standardized regression coefficients ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Standardized regression coefficients, computed from sampling data, measure the change in output responses (in standard deviation) per standard deviation change in the input. In contrast to correlation coefficients, they are robust to correlated inputs. They are inexpensive to compute from existing samples, where inputs and outputs are standardized (i.e., mean=0, std=1), and a multiple linear regression is performed on the standardized quantities. The standardized regression coefficients are the coefficients in the linear regression model. Standardized regresion coefficients can be optionally computed in the results of a random sampling study. Keyword reference: :dakkw:`std_regression_coeffs ` As with correlation coefficients, the standardized regression coefficients are not robust to nonlinear or non-monotonic input-output relationships, or relationships driven by interactions between inputs. .. _`sa:global:morris`: Morris one-at-a-time (elementary effects) method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Morris one-at-a-time (MOAT) method is a global approach to screen out unimportant inputs. The algorithm is described in detail in :ref:`the PSUADE MOAT documentation `. At a high level, the method creates a distribution of elementary effects for each input. Elementary effects are computed at random points using forward finite differences with large step sizes. From this sample set, the following metrics are computed: - **Modified mean** (:math:`\mu^*`): The average absolute value of elementary effects. Large values indicate the input has a significant effect on the output. - **Standard deviation** (:math:`\sigma`): The spread of elementary effects across the input space. Large values indicate either nonlinear effects or interactions with other inputs. Inputs with high :math:`\mu^*` are influential and should be retained for subsequent analyses. Inputs with high :math:`\sigma` relative to :math:`\mu^*` exhibit strong nonlinear behavior or interactions. The number of model evaluations to compute a size :math:`r` ensemble of elementary effects for :math:`d` inputs is :math:`N=r(d+1)`. A common first choice for :math:`r` is :math:`\sim 10-20`, then increasing as needed to achieve stability in the computed metrics. Morris sensitivity metrics can be a useful tool for screening out unimportant inputs at relatively low computational cost while being robust to nonlinear input-output relationships and relationships driven by interactions. However, they have some downsides: - Input-output samples generated by the algorithm are not independent, identically distributed, so are poorly suited for reuse in Monte Carlo-based uncertainty analysis. - While they can provide an indication of the presence of nonlinearity or interaction effects, they can't distinguish between the two or attribute output variance to specific inputs. .. _`sa:global:vbd`: Sobol' Indices/Variance-Based Decomposition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sobol' indices are variance-based global sensitivity measures derived from the functional ANOVA decomposition of a model. Let :math:`Y = f(X_1, \ldots, X_d)`, where the inputs are independent. The ANOVA decomposition represents the function as .. math:: Y = f_0 + \sum_{i=1}^d f_i (X_i) + \sum_{i`, the centered parameter study is particularly useful for initial sensitivity screening. Centered Parameter Study ~~~~~~~~~~~~~~~~~~~~~~~~ The :dakkw:`method-centered_parameter_study` varies each parameter along its axis, about a central point. The centered parameter study requires specifying: - ``initial_point``: The central point around which parameters are varied - ``steps_per_variable``: Number of steps to take in each direction for each parameter - ``step_vector``: Step size for each parameter .. literalinclude:: centered_pstudy.in :language: dakota :tab-width: 2 :caption: Centered parameter study input excerpt :name: sa:centered_pstudy The resulting data shows how the response changes as each parameter is varied while holding others constant. Large response variations indicate potentially influential parameters, while flat profiles suggest parameters may be candidates for fixing at nominal values. Note that parameter studies are generally analyzed through visualization and don't provide sensitivity metric or sensitivity ranking like other methods described on this page. .. _`sa:method_summary`: Method Summary ______________ The following table summarizes Dakota's sensitivity analysis methods and the metrics they provide: .. list-table:: Dakota Sensitivity Analysis Methods :header-rows: 1 :widths: 20 30 15 15 15 15 * - Category - Dakota Method - Univariate Trends - Correlations - Morris Metrics - Sobol Indices * - Parameter Studies - ``centered_parameter_study`` - P - - - * - - ``multidim_parameter_study`` - - P - - P * - Sampling - ``sampling`` - P - D - - * - - ``sampling`` with ``variance_based_decomp`` - P - D - - D * - Morris - ``psuade_moat`` - - - D - * - Stochastic Expansions - ``polynomial_chaos`` - - - - D D: Dakota-generated; P: Post-processing required with third-party tools .. _`sa:usage`: Usage Guidelines ---------------- This section provides guidance for selecting sensitivity analysis methods based on model cost, input dimension, response characteristics, and intended downstream use. Available methods for sensitivity analysis in Dakota, or via postprocessing Dakota output, include: - Centered parameter studies (one-at-a-time, conditional) - Scatterplots (visual diagnostics) - Morris one-at-a-time (MOAT) screening - Random sampling (e.g., Latin hypercube) with: - Correlation coefficients (Pearson, Spearman, partial) - Standardized regression coefficients (SRC) - Binned Sobol' main-effect indices - Saltelli sampling for Sobol' indices (main and total effects) - Polynomial chaos expansions (PCE) for Sobol' indices The choice of method depends primarily on: - Computational cost per model evaluation - Number of input variables - Smoothness and structure of the response - Whether model evaluations should be reusable - Downstream analysis goals (optimization, surrogate modeling, UQ) **Model cost considerations** - **Very Expensive Models** When the number of model evaluations is severely limited (i.e., :math:`N \lesssim 100` overall or :math:`N \lesssim 10 \, d` to :math:`20 \, d` where :math:`d` is the number of input variables): - Use Morris one-at-a-time screening for global sensitivity - Supplement with centered parameter sweeps or scatterplots for diagnostics - Random sampling-based screening can be effective, but Morris designs generally provide more stable sensitivity rankings for the same computational cost. Avoid: - Saltelli Sobol' indices (high cost) - Large random sampling designs - **Moderate Cost Models** When a moderate number of evaluations is feasible (i.e., :math:`N \sim 100 - 1000` or :math:`N \sim 10 \, d` to :math:`100 \, d`): - Generate a random sample (e.g., Latin hypercube sampling) - Compute: - Correlation coefficients - Standardized regression coefficients - Binned Sobol' main-effect indices - Compare to scatter plots to corroborate computed metrics Advantages: - A single dataset supports multiple sensitivity metrics - Enables reuse for surrogate modeling and uncertainty quantification - **Cheap Models** When model evaluations are inexpensive: - Use Saltelli sampling to compute Sobol' indices (main and total effects) - Optionally compare with correlation or regression-based measures **Input Dimension Considerations** - **High-Dimensional Problems** (e.g., :math:`d \gtrsim 20-30`) Problems with tens to hundreds of inputs are considered high-dimensional. In this regime, the number of model evaluations required for many sensitivity metrics grows rapidly with the number of inputs. - Prefer: - Morris screening (cost scales linearly with dimension, but with few samples) - Correlation or SRC from random samples (independent of dimension) - Use caution with: - Saltelli Sobol' (cost scales with dimension and require more samples) - High-order PCE unless sparse/adaptive methods are used - Recommended goal: - Screen out unimportant variable before applying more expensive methods - **Moderate-Dimensional Problems** (e.g., :math:`d \approx 5-20`) In this range, most methods are feasible with moderate computational effort. - Viable approaches: - Random sampling with correlation or SRC - Binned Sobol' indices (main effects only) - Morris screening - Saltelli Sobol' indices (budget permitting) - PCE (often practical) - Recommended goal: - Combine screening with more quantitative methods as needed **Low-Dimensional Problems** (e.g., :math:`d \lesssim 5-10`) With relatively few inputs, more comprehensive methods become practical, and screening methods such as MOAT are typically unnecessary - Prefer: - Sobol' indices (Saltelli can be tractable) - PCE-based Sobol' indices - Advantages: - Interaction effects can be more fully explored - High-fidelity variance decomposition is feasible **Model Smoothness** - **Smooth Responses** If the model is smooth and well-approximated by polynomials: - Use PCE-based Sobol' indices Advantages: - Analytical computation of Sobol' indices - Surrogate can be reused for UQ and optimization Caveats: - Efficiency depends on input dimension, since the number of PCE terms (and correspondingly, the number of model evaluations to approximate them) grows rapidly with input dimension and polynomial order. - PCEs will be most effective when the response admits a sparse representation (e.g., is dominated by low-order terms and limited interactions). - High-dimensional problems may require sparse or adaptive truncation strategies to remain tractable. - **Non-smooth or Unknown Structure** - Start with: - Morris screening - Spearman correlation - Be cautious with: - Regression-based methods - Low-order PCE **Downstream Use Cases** - **Reusable Sampling Designs** If model evaluations should be reused: - Prefer random sampling designs (e.g., Latin hypercube) These support: - Sensitivity analysis (correlation, SRC, Sobol') - Surrogate modeling - Uncertainty quantification Avoid relying solely on: - Morris (design-specific) - Saltelli sampling (specialized for Sobol' estimation) - **Surrogate Modeling** For building surrogate models: - Use space-filling random samples - Optionally construct PCE or other surrogates Avoid: - Centered parameter studies - Morris designs (poor coverage of input space) - **Optimization** If optimization is the primary goal: - Use sensitivity methods for insight and screening: - Centered parameter studies - Scatterplots - Correlation or MOAT - **Uncertainty Quantification** For full uncertainty quantification: - Prefer: - Sobol' indices (Saltelli or PCE-based) - Surrogate-based sensitivity analysis - Avoid relying solely on: - Correlation coefficients - Morris screening **Method Roles and Limitations** - **Centered Parameter Sweeps** - One-at-a-time variation of a single input over its specified range while holding other inputs fixed. - Provides a one-dimensional conditional response: .. math:: f(x_i \mid x_{-i} = x_{-i}^*) - Useful for: - Visualizing response shape (linearity, saturation, thresholds) - Identifying inputs with strong influence along the chosen slice - Diagnosing model behavior Limitations: - Results are conditional on the fixed values of other inputs - Do not account for variability in other inputs - Do not capture interactions unless additional sweeps are performed - Not a global sensitivity measure - **Scatterplots** - Reveals trends, monotonicity, nonlinear behavior, and interaction effects - Useful for exploratory analysis and comparing to computed metrics - **Morris One-at-a-Time** - Efficient global screening method - Detects nonlinearity and interactions qualitatively - Does not provide variance decomposition - **Correlation and SRC** - Low-cost, easy to compute - Suitable for monotonic or near-linear models - Limited in capturing interactions and non-monotonic effects - **Binned Sobol' Indices** - Approximate main effects using existing samples - Exploits reusable random samples - Don't provide total-effect indices - **Saltelli Sobol' Indices** - Provide quantitative main and total effects - Capture interactions - High computational cost and specialized sampling design - **PCE-Based Sobol' Indices** - Efficient for smooth models - Provide analytical variance decomposition - Depend on quality and structure of the surrogate **Summary** - Use Morris for low-cost screening - Use random sampling when reuse and flexibility are important - Use Sobol' indices for quantitative variance attribution - Use PCE when the model is smooth and surrogate reuse is desired - Use sweeps and scatterplots for conditional response insight and diagnostics Practical Tips ~~~~~~~~~~~~~~ - For studies where the number of model evaluations isn't influenced by the number of inputs (e.g., random sampling), include a "dummy" variable in your sensitivity analyses that is not used in the model. This lets you assess how well a noninfluential input is identified by your sensitivity metric of interest for your given sample set. - Assess convergence of sensitivity metrics by considering incremental increases in the number of model evaluations/samples (incremental studies or increased sample size exploiting Dakota's restart capability mean you don't repeat model evaluations). - Whenever possible, use plots (e.g., scatter plots) to confirm conclusions. References ---------- For more detailed treatments of sensitivity analysis theory and methods, consult: .. bibliography:: :filter: False Sal04 Hel00 Additional information is available in the Dakota User's Manual sections on: - :ref:`Parameter Study Capabilities ` - :ref:`Design of Experiments Capabilities ` - :ref:`Uncertainty Quantification Capabilities ` Corresponding keyword reference pages provide detailed information on method options and settings. Video Resources --------------- +----------------------+---------------------------------------------+----------------+ | Title | Link | Resources | +======================+=============================================+================+ | Sensitivity Analysis | |Training|_ | `Slides`__ / | | | | `Exercises`__ | +----------------------+---------------------------------------------+----------------+ | Scatterplot Analyses | `Sandia ASC V&V/UQ Scatterplots Webinar`_ | | +----------------------+---------------------------------------------+----------------+ | Sobol' Indices | `Sandia ASC V&V/UQ Sobol' Indices Webinar`_ | | +----------------------+---------------------------------------------+----------------+ .. __: https://dakota.sandia.gov/sites/default/files/training/DakotaTraining_SensitivityAnalysis.pdf __ https://dakota.sandia.gov/sites/default/files/training/sensitivity_analysis.zip .. |Training| image:: img/SensitivityAnalysisTrainingTeaser.png :alt: Sensitivity Analysis .. _Training: https://digitalops.sandia.gov/Mediasite/Play/PLACEHOLDER .. _Sandia ASC V&V/UQ Scatterplots Webinar: https://digitalops.sandia.gov/Mediasite/Play/7aade94c468a4a408a69ce4251bc8d4d1d .. _Sandia ASC V&V/UQ Sobol' Indices Webinar: https://digitalops.sandia.gov/Mediasite/Play/8ae40104faa54e429f55dca34c05dee51d