.. _input: Inputs to Dakota ================ .. _`input:overview`: Overview of Inputs ------------------ Dakota supports a number of command-line arguments, as described in :ref:`command-line-main`. Among these are specifications for the Dakota input file and, optionally, a restart file. Dakota input file syntax is described in the adjoining sections in :ref:`inputfile-main`, with detailed keyword syntax in :ref:`keyword-reference-area`. The restart file is described in :ref:`dakota_restart`. A Dakota input file may be prepared with a text editor such as Emacs, Vi, or WordPad, or with the Dakota graphical user interface. The Dakota GUI is built on the Java-based Eclipse Framework :cite:p:`Eclipse` and presents the Dakota input specification options in either a text editor view or a graphical view. It further includes templates and wizards for helping create Dakota studies and can invoke Dakota to run an analysis. Dakota GUI downloads for Linux, Windows, and Mac, are available from the Dakota website http://dakota.sandia.gov/, along with licensing information and installation tips. See :ref:`dakota-gui-main` for more documentation. .. _`input:tabularformat`: Tabular Data Formats ~~~~~~~~~~~~~~~~~~~~ The Dakota input file and/or command line may identify additional text files for tabular data import in contexts described in :ref:`input:import`. Examples include data from which to build a surrogate, points at which to run a list parameter study, post-run input data, and least squares and Bayesian calibration data. Dakota writes and reads tabular data with C++ stream operators/conversions, so most integer and floating point formats are acceptable for imported numeric data. Dakota supports the following tabular formats: - **Annotated:** In most contexts, Dakota tabular data defaults to "annotated" tabular format. An annotated tabular file is a whitespace-separated text file with one leading header row of comments/column labels. In most imports/exports, each subsequent row contains an evaluation ID and interface ID, followed by data for variables, or variables followed by responses, depending on context. This example shows 5 variables, followed by the 1 text_book response: :: %eval_id interface TF1ln TF1ln hpu_r1 hpu_r2 ModelForm text_book 1 I1 0.97399 1.0476 12 4.133 3 14753 2 I1 0.94468 1.0636 4.133 12 3 14753 3 I1 1.0279 1.0035 12 4.133 3 14753 Another example is shown in :numref:`output:tabcont`. .. note:: Dakota 6.1 and newer include a column for the interface ID. See the discussion of custom-annotated format below for importing/exporting Dakota 6.0 format files. For scalar experiment data files, each subsequent row contains an experiment ID, followed by data for configuration variables, observations, and/or observation errors, depending on context. This example shows 3 data points for each of two experiments. :: %experiment d1 d2 d3 1 82 15.5 2.02 2 82.2 15.45 2 - **Free-form:** When optionally specifying ``freeform`` for a given tabular import, the data file must be provided in a free-form format, omitting the leading header row and ID column(s). The raw num_rows x num_cols numeric data entries may appear separated with any whitespace including arbitrary spaces, tabs, and newlines. In this format, vectors may therefore appear as a single row or single column (or mixture; entries will populate the vector in order). This example shows the free-form version of the annotated data above: :: 0.97399 1.0476 12 4.133 3 14753 0.94468 1.0636 4.133 12 3 14753 1.0279 1.0035 12 4.133 3 14753 - **Custom-annotated:** In Dakota 6.2 and newer, a custom-annotated format is supported, to allow backward-compatibility with Dakota 6.0 tabular formats, which had a header and evaluation ID, but no interface ID. This can be specified, for example, with :: method list_parameter_study import_points_file = 'dakota_pstudy.3.dat' custom_annotated header eval_id The ``custom_annotated`` keyword has options to control ``header`` row, ``eval_id`` column, and ``interface_id`` column. In tabular files, variables appear in input specification order as documented in the reference manual. As of Dakota 6.1, tabular I/O has columns for all of the variables (active and inactive), not only the active variables as in previous versions. To import data corresponding only to the active variables, use the keyword ``active_only`` when specifying the import file. .. note:: Prior to October 2011, samples, calibration, and surrogate data files were free-form format. They now default to annotated format, though there are ``freeform`` and ``custom_annotated`` options. For both formats, a warning will be generated if a specific number of data are expected, but extra is found and an error generated when there is insufficient data. Some TPLs like SCOLIB and JEGA manage their own file I/O and only support the free-form option. .. _`input:import`: Data Imports ------------ The Dakota input file and/or command line may identify additional files used to import data into Dakota. AMPL algebraic mappings ~~~~~~~~~~~~~~~~~~~~~~~ As described in :ref:`advint:algebraic`, an AMPL specification of algebraic input-to-output relationships may be imported into Dakota and used to define or augment the mappings of a particular interface. The files :file:`stub.nl`, :file:`stub.row`, and :file:`stub.col` define the mapping. Genetic algorithm population import ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Genetic algorithms (GAs) from the JEGA and SCOLIB packages support a population import feature using the keywords ``initialization_type flat_file = STRING``. This is useful for warm starting GAs from available data or previous runs. Refer to the ``flat_file`` keywords in the :dakkw:`method` reference. The flat file must be in free-form format. .. _`input:calib_data`: Calibration data import ~~~~~~~~~~~~~~~~~~~~~~~ Calibration methods (deterministic least squares and Bayesian) require residuals, or differences between model predictions :math:`\mathbf{q}(\mathbf{\theta})` and data :math:`\mathbf{d}`: .. math:: \mathbf{r}(\mathbf{\theta}) = \mathbf{q}(\mathbf{\theta}) - \mathbf{d}, By default, if a Dakota input file specifies :ref:`responses-calibration_terms `, the simulation interface is required to return a vector of residuals :math:`\mathbf{r}` to Dakota. If in addition the input file includes :dakkw:`responses-calibration_terms-calibration_data` or :dakkw:`responses-calibration_terms-calibration_data_file`, Dakota assumes the interface will return the model predictions :math:`\mathbf{q}(\mathbf{\theta})` themselves and Dakota will compute residuals by differencing with the provided data. There are two calibration data import mechanisms: #. Scalar responses only with :dakkw:`responses-calibration_terms-calibration_data_file`: This uses a single tabular text file to import data values and (optionally) experiment numbers, configurations, and observation variances. Each row of the data file expresses this information for a single experiment. #. Field and/or scalar responses with :dakkw:`responses-calibration_terms-calibration_data`: In order to accommodate the richer structure of field-valued responses, this specification requires separate data files per response group (descriptor) ``DESC``, per experiment ``NUM``. The files are named ``DESC.NUM.*`` and must each be in a tabular text format. The tabular data files may be specified to be ``annotated`` (default), ``custom_annotated``, or ``freeform`` format. Calibration data imports include the following information: - **Configuration variables (optional):** state variable values indicating the configuration at which this experiment was conducted; length must agree with the number of state variables active in the study. .. warning:: In versions of Dakota prior to 6.14, string-valued configuration variables were specified in data files with 0-based indices into the admissible values. As of Dakota 6.14, strings must be specified by value. For example a string-valued configuration variable for an experimental condition might appear in the file as ``low_pressure`` vs. ``high_pressure``. - **Experimental observations (required):** experimental data values to difference with model responses; length equal to the total response length (number of scalars + sum(field lengths)). - **Experimental variances (optional):** measurement errors (variances/covariances) associated with the experimental observations For more on specifying calibration data imports, see :ref:`the nonlinear least squares examples ` and the reference documentation for :dakkw:`responses-calibration_terms`. **Note on variance:** Field responses may optionally have scalar, diagonal, or matrix-valued error covariance information. As an example, :numref:`fig:input:obs_err_cov` shows an observation vector with 5 responses; 2 scalar + 3 field (each field of length >1). The corresponding covariance matrix has scalar variances :math:`\sigma_1^2, \sigma_2^2` for each of the scalars :math:`s1, s2`, diagonal covariance :math:`D_3` for field :math:`f3`, scalar covariance :math:`\sigma_4^2` for field :math:`f4`, and full matrix covariance :math:`C_5` for field :math:`f5`. In total, Dakota supports block diagonal covariance :math:`\Sigma` across the responses, with blocks :math:`\Sigma_i`, which could be fully dense within a given field response group. Covariance across the highest-level responses (off-diagonal blocks) is not supported, nor is covariance between experiments. .. figure:: img/ObsErrorCovariance.png :alt: An example of scalar and field response data, with associated block-diagonal observation error covariance. :name: fig:input:obs_err_cov An example of scalar and field response data, with associated block-diagonal observation error covariance. PCE coefficient import ~~~~~~~~~~~~~~~~~~~~~~ Polynomial chaos expansion (PCE) methods compute coefficients for response expansions which employ a basis of multivariate orthogonal polynomials. Normally, the :dakkw:`method-polynomial_chaos` method calculates these coefficients based either on a spectral projection or a linear regression (see :ref:`uq:expansion`). However, Dakota also supports the option of importing a set of response PCE coefficients from a file specified with ``import_expansion_file = STRING``. Each row of the free-form formatted file must be comprised of a coefficient followed by its associated multi-index (the same format used for output described in :ref:`other:output:pce`). This file import can be used to evaluate moments analytically or compute probabilities numerically from a known response expansion. Refer to :dakkw:`method-polynomial_chaos-import_expansion_file` for additional information on this specification. Surrogate Model Imports ~~~~~~~~~~~~~~~~~~~~~~~ Global data fit surrogates, including some stochastic expansions, may be constructed from a variety of data sources. One of these sources is an auxiliary data file, as specified by the keyword :dakkw:`model-surrogate-global-import_build_points_file`. The file may be in annotated (default), custom-annotated, or free-form format with columns corresponding to variables and responses. For global surrogates specified directly via keywords ``model surrogate global``, the keyword ``use_variable_labels`` will trigger validation and potential reordering of imported variable columns based on labels provided in the tabular header. Surfpack global surrogate models may also be evaluated at a user-provided file containing challenge (test) points. Refer to the :ref:`model-surrogate-global` keywords for additional information on these specifications. Previously exported surfpack and experimental global surrogate models can be re-imported when used directly in the global surrogate model context. Importing from binary or text archive instead of building from data can sometimes result in significant time savings with models such as Gaussian processes. See the ``export_model`` and ``import_model`` keywords in :ref:`keyword-reference-area` for important caveats on its use. Variables/responses import to post-run ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The post-run mode (supported only for sampling, parameter study, and DACE methods) requires specification of a file containing parameter and response data. Annotated is the default format (see :ref:input:tabularformat`), where leading columns for evaluation and interface IDs are followed by columns for variables (active and inactive by default), then those for responses, with an ignored header row of labels and then one row per evaluation. Typically this file would be generated by executing :: dakota -i dakota.in -pre_run ::variables.dat and then separate from daktoa adding columns of response data to :file:`variables.dat` to make :file:`varsresponses.dat`. The file is specified at the command line with: :: dakota -i dakota.in -post_run varsresponses.dat:: To import post-run data in other formats, specify :dakkw:`environment-post_run` in the input file instead of on the command-line, and provide a format option.