Revisiting the previous simple ODE, compare with Adam

main advantages

  • useful when loss.backward() is unavailable
  • parallel computation
  • faster during early epochs

We revisit the previous simple ODE: \[ f'(x)=2x, \hspace{2em} x\in[-1, 1] \] with initial condition \(f(0)=0\). The collocation points are still \[ x_1=-1, x_2=-0.99, x_3=-0.98, \cdots, x_{199}=0.98, x_{200}=0.99, x_{201}=1 \] The neural architecture is still an MLP, \[ u(x;\theta):= \begin{bmatrix} \square \\ \vdots \\ \square \end{bmatrix}^\intercal \tanh\left( \begin{bmatrix} \square & \cdots & \square \\ \vdots & \ddots & \vdots \\ \square & \cdots & \square \end{bmatrix} \tanh \left( \begin{bmatrix} \square & \cdots & \square \\ \vdots & \ddots & \vdots \\ \square & \cdots & \square \end{bmatrix}\tanh \left( \begin{bmatrix} \square \\ \vdots \\ \square \end{bmatrix} x + \begin{bmatrix} \square \\ \vdots \\ \square \end{bmatrix} \right) + \begin{bmatrix} \square \\ \vdots \\ \square \end{bmatrix} \right) + \begin{bmatrix} \square \\ \vdots \\ \square \end{bmatrix}\right) . \]

The corresponding forward-mapping is \[ \mathcal{G}(\theta)=\begin{bmatrix} u'(x_1;\theta)-2x_1 & u'(x_2;\theta)-2x_2 & \cdots & u'(x_{201};\theta)-2x_{201} & \lambda u(0;\theta) \end{bmatrix}^\intercal. \] \(\lambda\) is the same tunable parameter to leverage the importance of data and physics, and we set \(\lambda=1000\) here.

If we glue the solutions from each epoch together, we can see EnKF changes the prediction earlier than Adam.