Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 105 additions & 52 deletions docs/source/math_description.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,106 +4,159 @@
Mathematical abstraction in UM-Bridge
=====================================

In this section, we will describe UM-Bridge's interface mathematically.
In this section, we will describe UM-Bridge's interface mathematically.

Model Evaluation
================
Let :math:`\mathcal{F}` denote the numerical model that maps the model input vector, :math:`\mathbf{x}` to
the output vector :math:`\mathbf{f(\mathbf{x})}`:
Let :math:`\mathbf{F}` denote the numerical model that maps the model input vector, :math:`\boldsymbol{\theta}`
to the output vector :math:`\mathbf{F}(\boldsymbol{\theta})`. We will use bold font to
indicate vectors. Note that both inputs and ouputs are required to be a list of lists in the actual
implementation. For a list of :math:`d` input vectors each with :math:`n` dimensions, we have

.. math::
\mathcal{F}\, : \,
\mathbf{x}
\mathbf{F}\, : \,
\mathbb{R}^{n \times d}
\;\longrightarrow\;
\mathbf{f}(\mathbf{x}), \quad
\mathbf{x} \in \mathbb{R}^d, \;
\mathbf{f}(\mathbf{x}) \in \mathbb{R}^n.
\mathbb{R}^{m \times d}.

The arguments ``inWrt`` and ``outWrt`` in functions, where derivatives are involved, allow the user to
select particular indices (out of :math:`d` indices) at which the derivative should be evaluated with
respect to. However, more of this will be clarified in the respective sections.

Additionally, there may be an objective function :math:`L = L(\mathbf{F}(\boldsymbol{\theta}))`.

UM-Bridge allows the following four operations.

Model Evaluation
================

This is simply the so called forward map that takes an element from the list of input vectors,
:math:`\boldsymbol{\theta} = (\theta_1, \ldots, \theta_n) \in \mathbb{R}^n`, and returns the model output,
:math:`\mathbf{F}(\boldsymbol{\theta}) = (F(\boldsymbol{\theta})_1, \ldots, F(\boldsymbol{\theta})_m) \in \mathbb{R}^m`.


Gradient Evaluation
===================
Gradient of the objective
=========================

The ``gradient`` function evaluates the sensitivity of a scalar
objective, :math:`L(\mathbf{f}(\mathbf{x}))`, that depends on the model output, with respect to the model input. Using the
chain rule:
The gradient function evaluates the sensitivity of the scalar objective. Using the chain rule:

.. math::
\nabla_{\mathbf{x}}L
= \left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right)^{\!\top}
:name: eq:1

\nabla_{\boldsymbol{\theta}}L
= \left(\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}\right)^{\!\top}
\boldsymbol{\lambda},
\qquad
\boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{f}},
\boldsymbol{\lambda} = \frac{\partial L}{\partial \mathbf{F}},

where :math:`\lambda` is known as the sensitivity vector.
where :math:`\boldsymbol{\lambda}` is known as the sensitivity vector and
:math:`\dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}` is actually the Jacobian of the
forward map.

Since there are multiple choices due to the format of the input and output, we can select a specific
component within the input (:math:`\boldsymbol{\theta}_i \in \mathbb{R}^n`) and output list of
lists (:math:`\mathbf{F}_j \in \mathbb{R}^m`). These indices are chosen using ``inWrt`` and ``outWrt``, respectively,
in the implementation.

Applying Jacobian
=================
So :ref:`(1) <eq:1>` becomes

.. math::

\nabla_{\boldsymbol{\theta}_i}
= \left( \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i} \right) ^ {\!\top}
\boldsymbol{\lambda}_j,
\qquad
\boldsymbol{\lambda}_j = \dfrac{\partial L}{\partial \mathbf{F}_j},

where :math:`\boldsymbol{\lambda}_j` is the ``sens`` argument in the code.

The ``apply_jacobian`` function evaluates the product of the model's Jacobian, :math:`J`, and a
vector, :math:`\mathbf{v}`, of the user's choice. The Jacobian of a vector-valued function
The output of this operation is a vector because we are essentially doing a matrix vector product.

Applying Jacobian to a vector
=============================

The apply Jacobian function evaluates the product of the transpose of the model's Jacobian, :math:`J^{\top}`, and a
vector, :math:`\mathbf{v}`, of the user's choice (``vec``). The Jacobian of a vector-valued function
is given by

.. math::
J =
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} =
\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} =
\left[
\begin{array}{ccc}
\dfrac{\partial \mathbf{f}}{\partial x_1} & \cdots & \dfrac{\partial \mathbf{f}}{\partial x_d}
\dfrac{\partial \mathbf{F}}{\partial \theta_1} & \cdots & \dfrac{\partial \mathbf{F}}{\partial \theta_n}
\end{array}
\right] =
\begin{pmatrix}
\dfrac{\partial f_{1}}{\partial x_{1}} & \cdots &
\dfrac{\partial f_{1}}{\partial x_{d}} \\[12pt]
\dfrac{\partial F_{1}}{\partial \theta_{1}} & \cdots &
\dfrac{\partial F_{1}}{\partial \theta_{n}} \\[12pt]
\vdots & \ddots & \vdots \\[4pt]
\dfrac{\partial f_{n}}{\partial x_{1}} & \cdots &
\dfrac{\partial f_{n}}{\partial x_{d}}
\dfrac{\partial F_{m}}{\partial \theta_{1}} & \cdots &
\dfrac{\partial F_{m}}{\partial \theta_{n}}
\end{pmatrix}
\in \mathbb{R}^{n \times d}.
\in \mathbb{R}^{m \times n}.


The output of this function for a chosen :math:`\mathbf{v} \in \mathbb{R}^{d}` is then
For a chosen :math:`\mathbf{v} \in \mathbb{R}^{n}`, this is simply

.. math::
\texttt{output}
= J\,\mathbf{v}
= \frac{\partial \mathbf{f}}{\partial \mathbf{x}}\,\mathbf{v}.
J^{\!\top}\,\mathbf{v}
= \left( \dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}} \right) ^ {\!\top} \,\mathbf{v}.

Additionally, we can use this (or vice versa) to expression the ``gradient`` function by setting
:math:`\mathbf{v} = \mathbf{\lambda}`.
Additionally, we can use this to express the gradient function by setting
:math:`\mathbf{v} = \boldsymbol{\lambda}` as mentioned before.

However, as before, we can choose an index each from the input and output to construct the Jacobian such that
:math:`J_{ji} = \frac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}`. The output of this
action is then

Applying Hessian
================
.. math::
\texttt{output} =
J_{ji}\,\mathbf{v}
= \dfrac{\partial \mathbf{F}_j}{\partial \boldsymbol{\theta}_i}\,\mathbf{v},

where the :math:`i^{th}` and :math:`j^{th}` indices coresspond to ``inWrt`` and ``outWrt``.

Applying Hessian to a vector
============================

This is a combination of the previous two sections: the output is still a matrix-vector product, but
The apply Hessian action is a combination of the previous two sections: the action is still a matrix-vector product, but
the matrix is the Hessian of an objective function. The Hessian, :math:`H`, is given by

.. math::
H =
\frac{\partial^2 L}{\partial \mathbf{x}\,\partial \mathbf{x}}
= \frac{\partial}{\partial \mathbf{x}}
\frac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}}
= \frac{\partial}{\partial \boldsymbol{\theta}}
\left(
\frac{\partial \mathbf{f}}{\partial \mathbf{x}}
\frac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}
\right)^{\!\top}
\boldsymbol{\lambda} =
H = \begin{bmatrix}
\dfrac{\partial^2 L}{\partial x_1^2} & \dfrac{\partial^2 L}{\partial x_1 \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_1 \partial x_n} \\[18pt]
\dfrac{\partial^2 L}{\partial x_2 \partial x_1} & \dfrac{\partial^2 L}{\partial x_2^2} & \cdots & \dfrac{\partial^2 L}{\partial x_2 \partial x_n} \\[18pt]
\begin{bmatrix}
\dfrac{\partial^2 L}{\partial \theta_1^2} & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_1 \partial \theta_n} \\[18pt]
\dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_2^2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_2 \partial \theta_n} \\[18pt]
\vdots & \vdots & \ddots & \vdots \\[6pt]
\dfrac{\partial^2 L}{\partial x_n \partial x_1} & \dfrac{\partial^2 L}{\partial x_n \partial x_2} & \cdots & \dfrac{\partial^2 L}{\partial x_n^2}
\dfrac{\partial^2 L}{\partial \theta_n \partial \theta_1} & \dfrac{\partial^2 L}{\partial \theta_n \partial \theta_2} & \cdots & \dfrac{\partial^2 L}{\partial \theta_n^2}
\end{bmatrix},

where :math:`L` is the objective function and :math:`\mathbf{\lambda}` is the sensitivity vector as defined in the ``gradient``
section.
where :math:`L` is the objective function and :math:`\boldsymbol{\lambda}` is the sensitivity vector as defined previously.

So the output for a chosen vector can be written as
So the product of :math:`H` and the chosen vector (of size :math:`n`) can be written as

.. math::
H\,\mathbf{v}
= \frac{\partial^2 \mathcal{L}}{\partial \mathbf{x}\,\partial \mathbf{x}}\,\mathbf{v} =
\left[\frac{\partial}{\partial \mathbf{x}}
= \dfrac{\partial^2 L}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}}\,\mathbf{v} =
\left[\dfrac{\partial}{\partial \boldsymbol{\theta}}
\left(
\frac{\partial \mathbf{f}}{\partial \mathbf{x}}
\dfrac{\partial \mathbf{F}}{\partial \boldsymbol{\theta}}
\right)^{\!\top}
\boldsymbol{\lambda}\right]\,\mathbf{v}.

As in the apply Jacobian action, we can select certain indices from the list of lists to construct the Hessian.
Since :math:`H` contains the second derivative of :math:`L`, we require two indices from the input:
``inWrt1`` and ``inWrt2``. The output of this action is

.. math::
\texttt{output} =
\left( \dfrac{\partial}{\partial \boldsymbol{\theta}_i}
\left[ \left( \dfrac{\partial \mathbf{F}_k}{\partial \boldsymbol{\theta}_j} \right) ^ {\!\top} \, \boldsymbol{\lambda}_k \right] \right)
\, \mathbf{v}.


Loading