# Solving Linear Systems

import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as la
%matplotlib inline


## Linear Systems

A linear system of equations is a collection of linear equations

\begin{align} a_{0,0}x_0 + a_{0,1}x_2 + \cdots + a_{0,n}x_n & = b_0 \\ a_{1,0}x_0 + a_{1,1}x_2 + \cdots + a_{1,n}x_n & = b_1 \\ & \vdots \\ a_{m,0}x_0 + a_{m,1}x_2 + \cdots + a_{m,n}x_n & = b_m \\ \end{align}

In matrix notation, a linear system is $A \mathbf{x}= \mathbf{b}$ where

$$A = \begin{bmatrix} a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\ a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\ \vdots & & & \vdots \\ a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\ \end{bmatrix} \ \ , \ \ \mathbf{x} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix} \ \ , \ \ \mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_m \end{bmatrix}$$

## Gaussian elimination

The general procedure to solve a linear system of equation is called Gaussian elimination. The idea is to perform elementary row operations to reduce the system to its row echelon form and then solve.

### Elementary Row Operations

Elementary row operations include:

1. Add $k$ times row $j$ to row $i$.
2. Multiply row $i$ by scalar $k$.
3. Switch rows $i$ and $j$.

Each of the elementary row operations is the result of matrix multiplication by an elementary matrix (on the left). To add $k$ times row $i$ to row $j$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except the $i,j$ entry is $E_{i,j} = k$. For example, if $A$ is 3 by 3 and we want to add 3 times row 2 to row 0 (using 0 indexing) then

$$E_1 = \begin{bmatrix} 1 & 0 & 3 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

Let's verify the calculation:

A = np.array([[1,1,2],[-1,3,1],[0,5,2]])
print(A)

[[ 1  1  2]
[-1  3  1]
[ 0  5  2]]

E1 = np.array([[1,0,3],[0,1,0],[0,0,1]])
print(E1)

[[1 0 3]
[0 1 0]
[0 0 1]]

E1 @ A

array([[ 1, 16,  8],
[-1,  3,  1],
[ 0,  5,  2]])


To multiply $k$ times row $i$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except the $,i,j$ entry is $E_{i,i} = k$. For example, if $A$ is 3 by 3 and we want to multiply row 1 by -2 then

$$E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

Let's verify the calculation:

E2 = np.array([[1,0,0],[0,-2,0],[0,0,1]])
print(E2)

[[ 1  0  0]
[ 0 -2  0]
[ 0  0  1]]

E2 @ A

array([[ 1,  1,  2],
[ 2, -6, -2],
[ 0,  5,  2]])


Finally, to switch row $i$ and row $j$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except $E_{i,i} = 0$, $E_{j,j} = 0$, $E_{i,j} = 1$ and $E_{j,i} = 1$. For example, if $A$ is 3 by 3 and we want to switch row 1 and row 2 then

$$E^3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}$$

Let's verify the calculation:

E3 = np.array([[1,0,0],[0,0,1],[0,1,0]])
print(E3)

[[1 0 0]
[0 0 1]
[0 1 0]]

E3 @ A

array([[ 1,  1,  2],
[ 0,  5,  2],
[-1,  3,  1]])


### Implementation

Let's write function to implement the elementary row operations. First of all, let's write a function called add_rows which takes input parameters $A$, $k$, $i$ and $j$ and returns the NumPy array resulting from adding $k$ times row $j$ to row $i$ in the matrix $A$. If $i=j$, then let's say that the function scales row $i$ by $k+1$ since this would be the result of $k$ times row $i$ added to row $i$.

def add_row(A,k,i,j):
"Add k times row j to row i in matrix A."
n = A.shape[0]
E = np.eye(n)
if i == j:
E[i,i] = k + 1
else:
E[i,j] = k
return E @ A


Let's test our function:

M = np.array([[1,1],[3,2]])
print(M)

[[1 1]
[3 2]]

add_row(M,2,0,1)

array([[7., 5.],
[3., 2.]])

add_row(M,3,1,1)

array([[ 1.,  1.],
[12.,  8.]])


Let's write a function called scale_row which takes 3 input parameters $A$, $k$, and $i$ and returns the matrix that results from $k$ times row $i$ in the matrix $A$.

def scale_row(A,k,i):
"Multiply row i by k in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = k
return E @ A

M = np.array([[3,1],[-2,7]])
print(M)

[[ 3  1]
[-2  7]]

scale_row(M,3,1)

array([[ 3.,  1.],
[-6., 21.]])

A = np.array([[1,1,1],[1,-1,0]])
print(A)

[[ 1  1  1]
[ 1 -1  0]]

scale_row(A,5,1)

array([[ 1.,  1.,  1.],
[ 5., -5.,  0.]])


Let's write a function called switch_rows which takes 3 input parameters $A$, $i$ and $j$ and returns the matrix that results from switching rows $i$ and $j$ in the matrix $A$.

def switch_rows(A,i,j):
"Switch rows i and j in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = 0
E[j,j] = 0
E[i,j] = 1
E[j,i] = 1
return E @ A

A = np.array([[1,1,1],[1,-1,0]])
print(A)

[[ 1  1  1]
[ 1 -1  0]]

switch_rows(A,0,1)

array([[ 1., -1.,  0.],
[ 1.,  1.,  1.]])


## Examples

### Find the Inverse

Let's apply our functions to the augmented matrix $[M \ | \ I]$ to find the inverse of the matrix $M$:

M = np.array([[5,4,2],[-1,2,1],[1,1,1]])
print(M)

[[ 5  4  2]
[-1  2  1]
[ 1  1  1]]

A = np.hstack([M,np.eye(3)])
print(A)

[[ 5.  4.  2.  1.  0.  0.]
[-1.  2.  1.  0.  1.  0.]
[ 1.  1.  1.  0.  0.  1.]]

A1 = switch_rows(A,0,2)
print(A1)

[[ 1.  1.  1.  0.  0.  1.]
[-1.  2.  1.  0.  1.  0.]
[ 5.  4.  2.  1.  0.  0.]]

A2 = add_row(A1,1,1,0)
print(A2)

[[1. 1. 1. 0. 0. 1.]
[0. 3. 2. 0. 1. 1.]
[5. 4. 2. 1. 0. 0.]]

A3 = add_row(A2,-5,2,0)
print(A3)

[[ 1.  1.  1.  0.  0.  1.]
[ 0.  3.  2.  0.  1.  1.]
[ 0. -1. -3.  1.  0. -5.]]

A4 = switch_rows(A3,1,2)
print(A4)

[[ 1.  1.  1.  0.  0.  1.]
[ 0. -1. -3.  1.  0. -5.]
[ 0.  3.  2.  0.  1.  1.]]

A5 = scale_row(A4,-1,1)
print(A5)

[[ 1.  1.  1.  0.  0.  1.]
[ 0.  1.  3. -1.  0.  5.]
[ 0.  3.  2.  0.  1.  1.]]

A6 = add_row(A5,-3,2,1)
print(A6)

[[  1.   1.   1.   0.   0.   1.]
[  0.   1.   3.  -1.   0.   5.]
[  0.   0.  -7.   3.   1. -14.]]

A7 = scale_row(A6,-1/7,2)
print(A7)

[[ 1.          1.          1.          0.          0.          1.        ]
[ 0.          1.          3.         -1.          0.          5.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]

A8 = add_row(A7,-3,1,2)
print(A8)

[[ 1.          1.          1.          0.          0.          1.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]

A9 = add_row(A8,-1,0,2)
print(A9)

[[ 1.          1.          0.          0.42857143  0.14285714 -1.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]

A10 = add_row(A9,-1,0,1)
print(A10)

[[ 1.          0.          0.          0.14285714 -0.28571429  0.        ]
[ 0.          1.          0.          0.28571429  0.42857143 -1.        ]
[ 0.          0.          1.         -0.42857143 -0.14285714  2.        ]]


Let's verify that we found the inverse $M^{-1}$ correctly:

Minv = A10[:,3:]
print(Minv)

[[ 0.14285714 -0.28571429  0.        ]
[ 0.28571429  0.42857143 -1.        ]
[-0.42857143 -0.14285714  2.        ]]

result = Minv @ M
print(result)

[[ 1.00000000e+00  4.44089210e-16  2.22044605e-16]
[-6.66133815e-16  1.00000000e+00 -2.22044605e-16]
[ 0.00000000e+00  0.00000000e+00  1.00000000e+00]]


Success! We can see the result more clearly if we round to 15 decimal places:

np.round(result,15)

array([[ 1.e+00,  0.e+00,  0.e+00],
[-1.e-15,  1.e+00, -0.e+00],
[ 0.e+00,  0.e+00,  1.e+00]])


### Solve a System

Let's use our functions to perform Gaussian elimination and solve a linear system of equations $A \mathbf{x} = \mathbf{b}$.

A = np.array([[6,15,1],[8,7,12],[2,7,8]])
print(A)

[[ 6 15  1]
[ 8  7 12]
[ 2  7  8]]

b = np.array([[2],[14],[10]])
print(b)

[[ 2]
[14]
[10]]


Form the augemented matrix $M$:

M = np.hstack([A,b])
print(M)

[[ 6 15  1  2]
[ 8  7 12 14]
[ 2  7  8 10]]


Perform row operations:

M1 = scale_row(M,1/6,0)
print(M1)

[[ 1.          2.5         0.16666667  0.33333333]
[ 8.          7.         12.         14.        ]
[ 2.          7.          8.         10.        ]]

M2 = add_row(M1,-8,1,0)
print(M2)

[[  1.           2.5          0.16666667   0.33333333]
[  0.         -13.          10.66666667  11.33333333]
[  2.           7.           8.          10.        ]]

M3 = add_row(M2,-2,2,0)
print(M3)

[[  1.           2.5          0.16666667   0.33333333]
[  0.         -13.          10.66666667  11.33333333]
[  0.           2.           7.66666667   9.33333333]]

M4 = scale_row(M3,-1/13,1)
print(M4)

[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          2.          7.66666667  9.33333333]]

M5 = add_row(M4,-2,2,1)
print(M5)

[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          0.          9.30769231 11.07692308]]

M6 = scale_row(M5,1/M5[2,2],2)
print(M6)

[[ 1.          2.5         0.16666667  0.33333333]
[ 0.          1.         -0.82051282 -0.87179487]
[ 0.          0.          1.          1.19008264]]

M7 = add_row(M6,-M6[1,2],1,2)
print(M7)

[[1.         2.5        0.16666667 0.33333333]
[0.         1.         0.         0.1046832 ]
[0.         0.         1.         1.19008264]]

M8 = add_row(M7,-M7[0,2],0,2)
print(M8)

[[1.         2.5        0.         0.13498623]
[0.         1.         0.         0.1046832 ]
[0.         0.         1.         1.19008264]]

M9 = add_row(M8,-M8[0,1],0,1)
print(M9)

[[ 1.          0.          0.         -0.12672176]
[ 0.          1.          0.          0.1046832 ]
[ 0.          0.          1.          1.19008264]]


Success! The solution of $Ax=b$ is

x = M9[:,3].reshape(3,1)
print(x)

[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]


Or, we can do it the easy way...

x = la.solve(A,b)
print(x)

[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]


## scipy.linalg.solve

We are mostly interested in linear systems $A \mathbf{x} = \mathbf{b}$ where there is a unique solution $\mathbf{x}$. This is the case when $A$ is a square matrix ($m=n$) and $\mathrm{det}(A) \not= 0$. To solve such a system, we can use the function scipy.linalg.solve.

The function returns a solution of the system of equations $A \mathbf{x} = \mathbf{b}$. For example:

A = np.array([[1,1],[1,-1]])
print(A)

[[ 1  1]
[ 1 -1]]

b1 = np.array([2,0])
print(b1)

[2 0]


And solve:

x1 = la.solve(A,b1)
print(x1)

[1. 1.]


Note that the output $\mathbf{x}$ is returned as a 1D NumPy array when the vector $\mathbf{b}$ (the right hand side) is entered as a 1D NumPy array. If we input $\mathbf{b}$ as a 2D NumPy array, then the output is a 2D NumPy array. For example:

A = np.array([[1,1],[1,-1]])
b2 = np.array([2,0]).reshape(2,1)
x2 = la.solve(A,b2)
print(x2)

[[1.]
[1.]]


Finally, if the right hand side $\mathbf{b}$ is a matrix, then the output is a matrix of the same size. It is the solution of $A \mathbf{x} = \mathbf{b}$ when $\mathbf{b}$ is a matrix. For example:

A = np.array([[1,1],[1,-1]])
b3 = np.array([[2,2],[0,1]])
x3 = la.solve(A,b3)
print(x3)

[[1.  1.5]
[1.  0.5]]


### Simple Example

Let's compute the solution of the system of equations

\begin{align} 2x + y &= 1 \\ x + y &= 1 \end{align}

Create the matrix of coefficients:

A = np.array([[2,1],[1,1]])
print(A)

[[2 1]
[1 1]]


And the vector $\mathbf{b}$:

b = np.array([1,-1]).reshape(2,1)
print(b)

[[ 1]
[-1]]


And solve:

x = la.solve(A,b)
print(x)

[[ 2.]
[-3.]]


We can verify the solution by computing the inverse of $A$:

Ainv = la.inv(A)
print(Ainv)

[[ 1. -1.]
[-1.  2.]]


And multiply $A^{-1} \mathbf{b}$ to solve for $\mathbf{x}$:

x = Ainv @ b
print(x)

[[ 2.]
[-3.]]


We get the same result. Success!

### Inverse or Solve

It's a bad idea to use the inverse $A^{-1}$ to solve $A \mathbf{x} = \mathbf{b}$ if $A$ is large. It's too computationally expensive. Let's create a large random matrix $A$ and vector $\mathbf{b}$ and compute the solution $\mathbf{x}$ in 2 ways:

N = 1000
A = np.random.rand(N,N)
b = np.random.rand(N,1)


Check the first entries $A$:

A[:3,:3]

array([[0.35754719, 0.63135432, 0.6572258 ],
[0.18450506, 0.14639832, 0.23528745],
[0.27576474, 0.46264005, 0.26589724]])


And for $\mathbf{b}$:

b[:4,:]

array([[0.82726751],
[0.96946096],
[0.31351176],
[0.63757837]])


Now we compare the speed of scipy.linalg.solve with scipy.linalg.inv:

%%timeit
x = la.solve(A,b)

2.77 s ± 509 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
x = la.inv(A) @ b

4.46 s ± 2.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


Solving with scipy.linalg.solve is about twice as fast!

## Exercises

Under construction