Numerical Methods Week 4¶

Curve Fitting 2¶

We continue with Curve Fitting. This week general linear regression.

Reading: Capra and Canale, introduction to part 5 and chapter 17.

Learning outcomes:

Extend the work on Linear Regression to polynomial and multiple variables.

Combine Python and analytical solutions or other platforms.

Check your code works correctly, via an external reference.

Matt Watkins mwatkins@lincoln.ac.uk¶

Fitting a Quadratic function¶

In the case that the largest power of $x$ is $x^2$ we have $$ y_i (a_0, a_1, a_2; x_i) = a_0 + a_1 x_i + a_2 x_i^2 + e_i $$ and an overall error function $$ S_r (a_0, a_1, a_2) = \sum_{i=0}^{N-1} (y_i - a_0 - a_1 x_i - a_2 x_i^2)^2 $$

This leads to a set of equations \begin{align*} \frac{\delta S_r (a_0, a_1, a_2)}{\delta a_0} & \implies \left(n\right)a_0 + \left(\sum x_i\right) a_1 + \left(\sum x_i^2\right) a_2 & = \sum y_i \\ \frac{\delta S_r (a_0, a_1, a_2)}{\delta a_1} & \implies \left(\sum x_i\right) a_0 + \left(\sum x_i^2\right) a_1 + \left(\sum x_i^3\right) a_2 & = \sum x_i y_i \\ \frac{\delta S_r (a_0, a_1, a_2)}{\delta a_2} & \implies \left(\sum x_i^2\right) a_0 + \left(\sum x_i^3\right) a_1 + \left(\sum x_i^4\right) a_2 & = \sum x_i^2 y_i \\ \end{align*}

Derive the above equations by differentiating $S_r$ with respect to each of the $a_i$ in turn.

Write the equations in matrix form $\textbf{A}x = b$, where $x$ is a column matrix with entries $a_0, a_1, a_2$ and $b$ is a column matrix with terms that do not depend on the fitting parameters, $a_0$, $a_1$ and $a_2$.

Solve for $a_0$, $a_1$ and $a_2$.

Plot your fitted parabola against the data to check the fit.

Fitting a Quadratic function¶

We can rewrite these equations as $$ \begin{pmatrix} \left(\sum x_i^0\right) & \left(\sum x_i\right) & \left(\sum x_i^2\right) \\ \left(\sum x_i\right) & \left(\sum x_i^2\right) & \left(\sum x_i^3\right) \\ \left(\sum x_i^2\right) & \left(\sum x_i^3\right) & \left(\sum x_i^4\right) \\ \end{pmatrix} \begin{pmatrix} a_0 \\ a_1 \\ a_2 \end{pmatrix} = \begin{pmatrix} \sum y_i \\ \sum x_i y_i \\ \sum x_i^2 y_i \end{pmatrix} $$

Which is of the form $\textbf{A}x = b$

General Linear Least Squares¶

$\newcommand{\vect}[1]{\boldsymbol{#1}}$Simple linear, polynomial and multiple linear regression can be generalised to the following linear least-squares model $$ y_i = a_0 z_0 (x_i) + a_1 z_1 (x_i) + a_2 z_2 (x_i) + \cdots + a_{m-1} z_{m-1} (x_i) + e_i $$ can now not polynomials in $x$ but some predefined functions of those positions

$z_0(x), z_1( x), \ldots , z_{m-1}( x )$ are $m$ $\textbf{basis functions}$. The predefined basis functions define the model, only depend on the $x$ coordinate. I is called linear least squares as the parameters $a_0, a_1, \ldots, a_{m-1}$ appear linearly. The $z$s can be highly non-linear in $x$.

For instance. $$ y_i = a_0 \cdot 1 + a_1 \cos (\omega x_i) + a_2 \sin (\omega x_i) $$ fits this model with $z_0 = 1, z_1 = \cos(\omega x_0)$ and $z_2 = \sin(\omega x_0)$. Where $x$ is a single independent variable and $\omega$ is a predefined constant.

General Linear Least Squares¶

We can rewrite $$ y_i = a_0 z_0 (x_i) + a_1 z_1 (x_i) + a_2 z_2 (x_i) + \cdots + a_{m-1} z_{m-1} (x_i) + e_i $$ in matrix notation as $$ \vect{y} = \vect{Z} \vect{a} + \vect{e} $$ where bold lower case indicates a column vector, and bold uppercase indicates a matrix. $\vect{Z}$ contains the calculated values of the $m$ basis functions at the $n$ measured values of the independent variables: $$ \vect{Z} = \left[ \begin{matrix} z_0 (x_0) & z_1 (x_0) & \cdots & z_{m-1} (x_0) \\ z_0 (x_1) & z_1 (x_1) & \cdots & z_{m-1} (x_1) \\ \vdots & \vdots & \ddots & \vdots \\ z_0 (x_{n-1}) & z_1 (x_{n-1}) & \cdots & z_{m-1} (x_{n-1}) \\ \end{matrix} \right] $$

The column vector $\vect{y}$ contains the $n+1$ observed values of the dependent variable $$ \vect{y}^T = \left[y_0, y_1, y_2, y_3, \ldots, y_{n-1} \right] $$ The column vector $\vect{a}$ contains the $m+1$ unknown parameters of the model $$ \vect{a}^T = \left[a_0, a_1, a_2, \ldots, a_{m-1} \right] $$ The column vector $\vect{e}$ contains the $n+1$ observed residuals (errors) $$ \vect{e}^T = \left[e_0, e_1, e_2, e_3, \ldots, e_{n-1} \right] $$

General Linear Least Squares¶

We can also express error in our model as a sum of the squares much like before: $$\begin{align*} S_r & = \sum_{i=0}^{n}\left(y_i - \sum_{j=0}^{m} a_j z_{ji} \right)^2 \\ & = \sum e_i^2 = \vect{e}^T \vect{e} = (\vect{y} - \vect{Z} \vect{a})^T (\vect{y} - \vect{Z} \vect{a}) \end{align*}$$ $S_r$ is minimised by taking partial derivatives wrt $\vect{a}$, which yields $$ \vect{Z}^T \vect{Z} \vect{a} = \vect{Z}^T \vect{y} $$ which is exactly equivalent to the set of simultaneous equations for $a_i$ we found previously when fitting polynomials. More details can of the derivation can be found here, though the notation is a little different.

This set of equations is of the form $\vect{A} \vect{x} = \vect{b}$ and can be solved using gauss elimination or similar method.

Try to redo the earlier fitting problems in this notation / method.

General Linear Least Squares - Example¶

Suppose we have 11 measurements at $$ x^T = [-3. , -2.3, -1.6, -0.9, -0.2, 0.5, 1.2, 1.9, 2.6, 3.3, 4.0] $$ with measured values $$ y^T = [8.26383742, 6.44045188, 4.74903073, 4.5656476 , 3.61011683, 3.32743918, 2.9643915 , 1.02239181, 1.09485138, 1.84053372, 1.49110572] $$

Let us fit it to a model of the form $y_i = a_0 \cdot 1 + a_1 e^{-x_i} + a_2 e^{-2x_i} $

Our $\vect{Z}$ matrix has 3 columns for the basis functions $z_0 (x_i) = 1$, $z_1 (x_i) = e^{-x_i}$ and finally $z_2 = e^{-2x_i}$. It will have 11 rows corresponding to the 11 measurements.

        Z = 
        [
        [  1.00000000e+00,   2.00855369e+01,   4.03428793e+02],
        [  1.00000000e+00,   9.97418245e+00,   9.94843156e+01],
        [  1.00000000e+00,   4.95303242e+00,   2.45325302e+01],
        [  1.00000000e+00,   2.45960311e+00,   6.04964746e+00],
        [  1.00000000e+00,   1.22140276e+00,   1.49182470e+00],
        [  1.00000000e+00,   6.06530660e-01,   3.67879441e-01],
        [  1.00000000e+00,   3.01194212e-01,   9.07179533e-02],
        [  1.00000000e+00,   1.49568619e-01,   2.23707719e-02],
        [  1.00000000e+00,   7.42735782e-02,   5.51656442e-03],
        [  1.00000000e+00,   3.68831674e-02,   1.36036804e-03],
        [  1.00000000e+00,   1.83156389e-02,   3.35462628e-04]]

Then we set up the linear equation problem by forming $\vect{Z}^T \vect{Z}$

        ZTZ =
        [
        [  1.10000000e+01,   3.98805235e+01,   5.35475292e+02],
        [  3.98805235e+01,   5.35475292e+02,   9.23382518e+03],
        [  5.35475292e+02,   9.23382518e+03,   1.73292733e+05]]

and $\vect{Z}^T \vect{y}$

        ZTy = 
        [   39.36979777,   272.62352738,  4125.63079852]

The solutions of this problem are

        a = [ 2.13758951,  0.58605735, -0.01537541]

This means that our final model for the data is $$ y = 2.13758951 + 0.58605735e^{-x} - 0.01537541e^{-2x} $$

Statistical interpretation of least squares¶

The matrix $(\vect{Z}^T \vect{Z})^{-1}$ contains the variance (diagonal elements) and covariances (off-diagonal elements) of the $a_i$ so can be used to estimate the accuracy of the parameter estimation.

We can use the Gauss Jordan method to find $(\vect{Z}^T \vect{Z})^{-1}$.

The diagonal elements of $(\vect{Z}^T \vect{Z})^{-1}$ can be designated as $z_{i,i}^{-1}$

We will call the standard error of our fitted model to the data $s_{y/x} = \frac{1}{\sqrt{n-m}}\sqrt{\sum_{i=0}^{i=n-1} [ y_i - (a_0 z_0 (x_i) + a_1 z_1 (x_i) + a_2 z_2 (x_i) + \cdots + a_{m-1} z_{m-1} (x_i)) ] ^2}$

The variances of our parameters are given by $\text{var}(a_i) = s^2(a_i)= z_{i,i}^{-1} s_{y/x}^2$

We can now place error limits on our optimal parameters, $a_0, ... a_{m-1}$.

If our model is good, the real data should be approximately normally distributed around our model

You can then show that the parameters should have a t-distribution with $n-2$ degrees of freedom.

We can put confidence limits on the parameters using $P(\text{true value of the ith parameter is in the interval } (a_i - t_{c/2,n-2} s(a_i) , a_i + t_{c/2,n-2} s(a_i) ) = c$ where $c$ is our confidence, for instance 0.95 to be 95% certain the parameter lies within those bounds and $t_c$ are the critical values for the appropriate t distribution.

Perform a fit to the following data.

use the model form $y_i = a_0 \cdot 1 + a_1 x_i + e_i$

Calculate an error estimate for the optimal parameters. Note, you can perform the matrix inversion required using Gauss Jordan elimination, or numpy methods.

Remember to break the problem down into smaller ones:

Can you set up the $\vect{Z}$ matrix?

Can you find the transpose of the $\vect{Z}$ matrix?

Can you multiply the transposed and non-transposed matrices together?

Can you solve the system of linear equations?

Summary and Further Reading¶

You should be reading additional material to provide a solid background to what we do in class

Reading: Capra and Canale, introduction to part 5 and chapter 17.

Lots of details inchapters 14 and 15 of Numerical Recipes.