diff --git a/docs/source/math_description.rst b/docs/source/math_description.rst index e6c4437..bebc337 100644 --- a/docs/source/math_description.rst +++ b/docs/source/math_description.rst @@ -4,106 +4,159 @@ Mathematical abstraction in UM-Bridge ===================================== -In this section, we will describe UM-Bridge's interface mathematically. +In this section, we will describe UM-Bridge's interface mathematically. -Model Evaluation -================ -Let :math:`\mathcal{F}` denote the numerical model that maps the model input vector, :math:`\mathbf{x}` to -the output vector :math:`\mathbf{f(\mathbf{x})}`: +Let :math:`\mathbf{F}` denote the numerical model that maps the model input vector, :math:`\boldsymbol{\theta}` +to the output vector :math:`\mathbf{F}(\boldsymbol{\theta})`. We will use bold font to +indicate vectors. Note that both inputs and ouputs are required to be a list of lists in the actual +implementation. For a list of :math:`d` input vectors each with :math:`n` dimensions, we have .. math:: - \mathcal{F}\, : \, - \mathbf{x} + \mathbf{F}\, : \, + \mathbb{R}^{n \times d} \;\longrightarrow\; - \mathbf{f}(\mathbf{x}), \quad - \mathbf{x} \in \mathbb{R}^d, \; - \mathbf{f}(\mathbf{x}) \in \mathbb{R}^n. + \mathbb{R}^{m \times d}. + +The arguments ``inWrt`` and ``outWrt`` in functions, where derivatives are involved, allow the user to +select particular indices (out of :math:`d` indices) at which the derivative should be evaluated with +respect to. However, more of this will be clarified in the respective sections. + +Additionally, there may be an objective function :math:`L = L(\mathbf{F}(\boldsymbol{\theta}))`. + +UM-Bridge allows the following four operations. + +Model Evaluation +================ + +This is simply the so called forward map that takes an element from the list of input vectors, +:math:`\boldsymbol{\theta} = (\theta_1, \ldots, \theta_n) \in \mathbb{R}^n`, and returns the model output, +:math:`\mathbf{F}(\boldsymbol{\theta}) = (F(\boldsymbol{\theta})_1, \ldots, F(\boldsymbol{\theta})_m) \in \mathbb{R}^m`. -Gradient Evaluation -=================== +Gradient of the objective +========================= -The ``gradient`` function evaluates the sensitivity of a scalar -objective, :math:`L(\mathbf{f}(\mathbf{x}))`, that depends on the model output, with respect to the model input. Using the -chain rule: +The gradient function evaluates the sensitivity of the scalar objective. Using the chain rule: .. math:: - \nabla_{\mathbf{x}}L - = \left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right)^{\!\top} + :name: eq:1 + + \nabla_{\boldsymbol{\theta}}L + = \left(\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}\right)^{\!\top} \boldsymbol{\lambda}, \qquad - \boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{f}}, + \boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{F}}, -where :math:`\lambda` is known as the sensitivity vector. +where :math:`\boldsymbol{\lambda}` is known as the sensitivity vector and +:math:`\dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}` is actually the Jacobian of the +forward map. +Since there are multiple choices due to the format of the input and output, we can select a specific +component within the input (:math:`\boldsymbol{\theta}_i \in \mathbb{R}^n`) and output list of +lists (:math:`\mathbf{F}_j \in \mathbb{R}^m`). These indices are chosen using ``inWrt`` and ``outWrt``, respectively, +in the implementation. -Applying Jacobian -================= +So :ref:`(1) ` becomes + +.. math:: + + \nabla_{\boldsymbol{\theta}_i} + = \left( \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i} \right) ^ {\!\top} + \boldsymbol{\lambda}_j, + \qquad + \boldsymbol{\lambda}_j = \dfrac{\partial L}{\partial \mathbf{F}_j}, + +where :math:`\boldsymbol{\lambda}_j` is the ``sens`` argument in the code. -The ``apply_jacobian`` function evaluates the product of the model's Jacobian, :math:`J`, and a -vector, :math:`\mathbf{v}`, of the user's choice. The Jacobian of a vector-valued function +The output of this operation is a vector because we are essentially doing a matrix vector product. + +Applying Jacobian to a vector +============================= + +The apply Jacobian function evaluates the product of the transpose of the model's Jacobian, :math:`J^{\top}`, and a +vector, :math:`\mathbf{v}`, of the user's choice (``vec``). The Jacobian of a vector-valued function is given by .. math:: J = - \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = + \frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} = \left[ \begin{array}{ccc} - \dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_d} + \dfrac{\partial \mathbf{F}}{\partial \theta_1} & \cdots & \dfrac{\partial \mathbf{F}}{\partial \theta_n} \end{array} \right] = \begin{pmatrix} - \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & - \dfrac{\partial f_{1}}{\partial x_{d}} \\[12pt] + \dfrac{\partial F_{1}}{\partial \theta_{1}} & \cdots & + \dfrac{\partial F_{1}}{\partial \theta_{n}} \\[12pt] \vdots & \ddots & \vdots \\[4pt] - \dfrac{\partial f_{n}}{\partial x_{1}} & \cdots & - \dfrac{\partial f_{n}}{\partial x_{d}} + \dfrac{\partial F_{m}}{\partial \theta_{1}} & \cdots & + \dfrac{\partial F_{m}}{\partial \theta_{n}} \end{pmatrix} - \in \mathbb{R}^{n \times d}. + \in \mathbb{R}^{m \times n}. -The output of this function for a chosen :math:`\mathbf{v} \in \mathbb{R}^{d}` is then +For a chosen :math:`\mathbf{v} \in \mathbb{R}^{n}`, this is simply .. math:: - \texttt{output} - = J\,\mathbf{v} - = \frac{\partial \mathbf{f}}{\partial \mathbf{x}}\,\mathbf{v}. + J^{\!\top}\,\mathbf{v} + = \left( \dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} \right) ^ {\!\top} \,\mathbf{v}. -Additionally, we can use this (or vice versa) to expression the ``gradient`` function by setting -:math:`\mathbf{v} = \mathbf{\lambda}`. +Additionally, we can use this to express the gradient function by setting +:math:`\mathbf{v} = \boldsymbol{\lambda}` as mentioned before. +However, as before, we can choose an index each from the input and output to construct the Jacobian such that +:math:`J_{ji} = \frac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}`. The output of this +action is then -Applying Hessian -================ +.. math:: + \texttt{output} = + J_{ji}\,\mathbf{v} + = \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}\,\mathbf{v}, + +where the :math:`i^{th}` and :math:`j^{th}` indices coresspond to ``inWrt`` and ``outWrt``. + +Applying Hessian to a vector +============================ -This is a combination of the previous two sections: the output is still a matrix-vector product, but +The apply Hessian action is a combination of the previous two sections: the action is still a matrix-vector product, but the matrix is the Hessian of an objective function. The Hessian, :math:`H`, is given by .. math:: H = - \frac{\partial^2 L}{\partial \mathbf{x}\,\partial \mathbf{x}} - = \frac{\partial}{\partial \mathbf{x}} + \frac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}} + = \frac{\partial}{\partial \boldsymbol{\theta}} \left( - \frac{\partial \mathbf{f}}{\partial \mathbf{x}} + \frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} \right)^{\!\top} \boldsymbol{\lambda} = - H = \begin{bmatrix} - \dfrac{\partial^2 L}{\partial x_1^2} & \dfrac{\partial^2 L}{\partial x_1 \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_1 \partial x_n} \\[18pt] - \dfrac{\partial^2 L}{\partial x_2 \partial x_1} & \dfrac{\partial^2 L}{\partial x_2^2} & \cdots & \dfrac{\partial^2 L}{\partial x_2 \partial x_n} \\[18pt] + \begin{bmatrix} + \dfrac{\partial^2 L}{\partial \theta_1^2} & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_n} \\[18pt] + \dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_2^2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_n} \\[18pt] \vdots & \vdots & \ddots & \vdots \\[6pt] - \dfrac{\partial^2 L}{\partial x_n \partial x_1} & \dfrac{\partial^2 L}{\partial x_n \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_n^2} + \dfrac{\partial^2 L}{\partial \theta_n \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_n \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_n^2} \end{bmatrix}, -where :math:`L` is the objective function and :math:`\mathbf{\lambda}` is the sensitivity vector as defined in the ``gradient`` -section. +where :math:`L` is the objective function and :math:`\boldsymbol{\lambda}` is the sensitivity vector as defined previously. -So the output for a chosen vector can be written as +So the product of :math:`H` and the chosen vector (of size :math:`n`) can be written as .. math:: H\,\mathbf{v} - = \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}\,\partial \mathbf{x}}\,\mathbf{v} = - \left[\frac{\partial}{\partial \mathbf{x}} + = \dfrac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}}\,\mathbf{v} = + \left[\dfrac{\partial}{\partial \boldsymbol{\theta}} \left( - \frac{\partial \mathbf{f}}{\partial \mathbf{x}} + \dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} \right)^{\!\top} \boldsymbol{\lambda}\right]\,\mathbf{v}. + +As in the apply Jacobian action, we can select certain indices from the list of lists to construct the Hessian. +Since :math:`H` contains the second derivative of :math:`L`, we require two indices from the input: +``inWrt1`` and ``inWrt2``. The output of this action is + +.. math:: + \texttt{output} = + \left( \dfrac{\partial}{\partial \boldsymbol{\theta}_i} + \left[ \left( \dfrac{\partial \mathbf{F}_k}{\partial \boldsymbol{\theta}_j} \right) ^ {\!\top} \, \boldsymbol{\lambda}_k \right] \right) + \, \mathbf{v}. + +