Solving Linear Systems
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg as la
%matplotlib inline
Linear Systems
A linear system of equations is a collection of linear equations
\begin{align} a_{0,0}x_0 + a_{0,1}x_2 + \cdots + a_{0,n}x_n & = b_0 \\ a_{1,0}x_0 + a_{1,1}x_2 + \cdots + a_{1,n}x_n & = b_1 \\ & \vdots \\ a_{m,0}x_0 + a_{m,1}x_2 + \cdots + a_{m,n}x_n & = b_m \\ \end{align}
In matrix notation, a linear system is $A \mathbf{x}= \mathbf{b}$ where
$$ A = \begin{bmatrix} a_{0,0} & a_{0,1} & \cdots & a_{0,n} \\ a_{1,0} & a_{1,1} & \cdots & a_{1,n} \\ \vdots & & & \vdots \\ a_{m,0} & a_{m,1} & \cdots & a_{m,n} \\ \end{bmatrix} \ \ , \ \ \mathbf{x} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix} \ \ , \ \ \mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_m \end{bmatrix} $$
Gaussian elimination
The general procedure to solve a linear system of equation is called Gaussian elimination. The idea is to perform elementary row operations to reduce the system to its row echelon form and then solve.
Elementary Row Operations
Elementary row operations include:
- Add $k$ times row $j$ to row $i$.
- Multiply row $i$ by scalar $k$.
- Switch rows $i$ and $j$.
Each of the elementary row operations is the result of matrix multiplication by an elementary matrix (on the left). To add $k$ times row $i$ to row $j$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except the $i,j$ entry is $E_{i,j} = k$. For example, if $A$ is 3 by 3 and we want to add 3 times row 2 to row 0 (using 0 indexing) then
$$ E_1 = \begin{bmatrix} 1 & 0 & 3 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$
Let's verify the calculation:
A = np.array([[1,1,2],[-1,3,1],[0,5,2]])
print(A)
[[ 1 1 2]
[-1 3 1]
[ 0 5 2]]
E1 = np.array([[1,0,3],[0,1,0],[0,0,1]])
print(E1)
[[1 0 3]
[0 1 0]
[0 0 1]]
E1 @ A
array([[ 1, 16, 8],
[-1, 3, 1],
[ 0, 5, 2]])
To multiply $k$ times row $i$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except the $,i,j$ entry is $E_{i,i} = k$. For example, if $A$ is 3 by 3 and we want to multiply row 1 by -2 then
$$ E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$
Let's verify the calculation:
E2 = np.array([[1,0,0],[0,-2,0],[0,0,1]])
print(E2)
[[ 1 0 0]
[ 0 -2 0]
[ 0 0 1]]
E2 @ A
array([[ 1, 1, 2],
[ 2, -6, -2],
[ 0, 5, 2]])
Finally, to switch row $i$ and row $j$ in a matrix $A$, we multiply $A$ by the matrix $E$ where $E$ is equal to the identity matrix except $E_{i,i} = 0$, $E_{j,j} = 0$, $E_{i,j} = 1$ and $E_{j,i} = 1$. For example, if $A$ is 3 by 3 and we want to switch row 1 and row 2 then
$$ E^3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} $$
Let's verify the calculation:
E3 = np.array([[1,0,0],[0,0,1],[0,1,0]])
print(E3)
[[1 0 0]
[0 0 1]
[0 1 0]]
E3 @ A
array([[ 1, 1, 2],
[ 0, 5, 2],
[-1, 3, 1]])
Implementation
Let's write function to implement the elementary row operations. First of all, let's write a function called add_rows
which takes input parameters $A$, $k$, $i$ and $j$ and returns the NumPy array resulting from adding $k$ times row $j$ to row $i$ in the matrix $A$. If $i=j$, then let's say that the function scales row $i$ by $k+1$ since this would be the result of $k$ times row $i$ added to row $i$.
def add_row(A,k,i,j):
"Add k times row j to row i in matrix A."
n = A.shape[0]
E = np.eye(n)
if i == j:
E[i,i] = k + 1
else:
E[i,j] = k
return E @ A
Let's test our function:
M = np.array([[1,1],[3,2]])
print(M)
[[1 1]
[3 2]]
add_row(M,2,0,1)
array([[7., 5.],
[3., 2.]])
add_row(M,3,1,1)
array([[ 1., 1.],
[12., 8.]])
Let's write a function called scale_row
which takes 3 input parameters $A$, $k$, and $i$ and returns the matrix that results from $k$ times row $i$ in the matrix $A$.
def scale_row(A,k,i):
"Multiply row i by k in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = k
return E @ A
M = np.array([[3,1],[-2,7]])
print(M)
[[ 3 1]
[-2 7]]
scale_row(M,3,1)
array([[ 3., 1.],
[-6., 21.]])
A = np.array([[1,1,1],[1,-1,0]])
print(A)
[[ 1 1 1]
[ 1 -1 0]]
scale_row(A,5,1)
array([[ 1., 1., 1.],
[ 5., -5., 0.]])
Let's write a function called switch_rows
which takes 3 input parameters $A$, $i$ and $j$ and returns the matrix that results from switching rows $i$ and $j$ in the matrix $A$.
def switch_rows(A,i,j):
"Switch rows i and j in matrix A."
n = A.shape[0]
E = np.eye(n)
E[i,i] = 0
E[j,j] = 0
E[i,j] = 1
E[j,i] = 1
return E @ A
A = np.array([[1,1,1],[1,-1,0]])
print(A)
[[ 1 1 1]
[ 1 -1 0]]
switch_rows(A,0,1)
array([[ 1., -1., 0.],
[ 1., 1., 1.]])
Examples
Find the Inverse
Let's apply our functions to the augmented matrix $[M \ | \ I]$ to find the inverse of the matrix $M$:
M = np.array([[5,4,2],[-1,2,1],[1,1,1]])
print(M)
[[ 5 4 2]
[-1 2 1]
[ 1 1 1]]
A = np.hstack([M,np.eye(3)])
print(A)
[[ 5. 4. 2. 1. 0. 0.]
[-1. 2. 1. 0. 1. 0.]
[ 1. 1. 1. 0. 0. 1.]]
A1 = switch_rows(A,0,2)
print(A1)
[[ 1. 1. 1. 0. 0. 1.]
[-1. 2. 1. 0. 1. 0.]
[ 5. 4. 2. 1. 0. 0.]]
A2 = add_row(A1,1,1,0)
print(A2)
[[1. 1. 1. 0. 0. 1.]
[0. 3. 2. 0. 1. 1.]
[5. 4. 2. 1. 0. 0.]]
A3 = add_row(A2,-5,2,0)
print(A3)
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 3. 2. 0. 1. 1.]
[ 0. -1. -3. 1. 0. -5.]]
A4 = switch_rows(A3,1,2)
print(A4)
[[ 1. 1. 1. 0. 0. 1.]
[ 0. -1. -3. 1. 0. -5.]
[ 0. 3. 2. 0. 1. 1.]]
A5 = scale_row(A4,-1,1)
print(A5)
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 1. 3. -1. 0. 5.]
[ 0. 3. 2. 0. 1. 1.]]
A6 = add_row(A5,-3,2,1)
print(A6)
[[ 1. 1. 1. 0. 0. 1.]
[ 0. 1. 3. -1. 0. 5.]
[ 0. 0. -7. 3. 1. -14.]]
A7 = scale_row(A6,-1/7,2)
print(A7)
[[ 1. 1. 1. 0. 0. 1. ]
[ 0. 1. 3. -1. 0. 5. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]
A8 = add_row(A7,-3,1,2)
print(A8)
[[ 1. 1. 1. 0. 0. 1. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]
A9 = add_row(A8,-1,0,2)
print(A9)
[[ 1. 1. 0. 0.42857143 0.14285714 -1. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]
A10 = add_row(A9,-1,0,1)
print(A10)
[[ 1. 0. 0. 0.14285714 -0.28571429 0. ]
[ 0. 1. 0. 0.28571429 0.42857143 -1. ]
[ 0. 0. 1. -0.42857143 -0.14285714 2. ]]
Let's verify that we found the inverse $M^{-1}$ correctly:
Minv = A10[:,3:]
print(Minv)
[[ 0.14285714 -0.28571429 0. ]
[ 0.28571429 0.42857143 -1. ]
[-0.42857143 -0.14285714 2. ]]
result = Minv @ M
print(result)
[[ 1.00000000e+00 4.44089210e-16 2.22044605e-16]
[-6.66133815e-16 1.00000000e+00 -2.22044605e-16]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Success! We can see the result more clearly if we round to 15 decimal places:
np.round(result,15)
array([[ 1.e+00, 0.e+00, 0.e+00],
[-1.e-15, 1.e+00, -0.e+00],
[ 0.e+00, 0.e+00, 1.e+00]])
Solve a System
Let's use our functions to perform Gaussian elimination and solve a linear system of equations $A \mathbf{x} = \mathbf{b}$.
A = np.array([[6,15,1],[8,7,12],[2,7,8]])
print(A)
[[ 6 15 1]
[ 8 7 12]
[ 2 7 8]]
b = np.array([[2],[14],[10]])
print(b)
[[ 2]
[14]
[10]]
Form the augemented matrix $M$:
M = np.hstack([A,b])
print(M)
[[ 6 15 1 2]
[ 8 7 12 14]
[ 2 7 8 10]]
Perform row operations:
M1 = scale_row(M,1/6,0)
print(M1)
[[ 1. 2.5 0.16666667 0.33333333]
[ 8. 7. 12. 14. ]
[ 2. 7. 8. 10. ]]
M2 = add_row(M1,-8,1,0)
print(M2)
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. -13. 10.66666667 11.33333333]
[ 2. 7. 8. 10. ]]
M3 = add_row(M2,-2,2,0)
print(M3)
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. -13. 10.66666667 11.33333333]
[ 0. 2. 7.66666667 9.33333333]]
M4 = scale_row(M3,-1/13,1)
print(M4)
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 2. 7.66666667 9.33333333]]
M5 = add_row(M4,-2,2,1)
print(M5)
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 0. 9.30769231 11.07692308]]
M6 = scale_row(M5,1/M5[2,2],2)
print(M6)
[[ 1. 2.5 0.16666667 0.33333333]
[ 0. 1. -0.82051282 -0.87179487]
[ 0. 0. 1. 1.19008264]]
M7 = add_row(M6,-M6[1,2],1,2)
print(M7)
[[1. 2.5 0.16666667 0.33333333]
[0. 1. 0. 0.1046832 ]
[0. 0. 1. 1.19008264]]
M8 = add_row(M7,-M7[0,2],0,2)
print(M8)
[[1. 2.5 0. 0.13498623]
[0. 1. 0. 0.1046832 ]
[0. 0. 1. 1.19008264]]
M9 = add_row(M8,-M8[0,1],0,1)
print(M9)
[[ 1. 0. 0. -0.12672176]
[ 0. 1. 0. 0.1046832 ]
[ 0. 0. 1. 1.19008264]]
Success! The solution of $Ax=b$ is
x = M9[:,3].reshape(3,1)
print(x)
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]
Or, we can do it the easy way...
x = la.solve(A,b)
print(x)
[[-0.12672176]
[ 0.1046832 ]
[ 1.19008264]]
scipy.linalg.solve
We are mostly interested in linear systems $A \mathbf{x} = \mathbf{b}$ where there is a unique solution $\mathbf{x}$. This is the case when $A$ is a square matrix ($m=n$) and $\mathrm{det}(A) \not= 0$. To solve such a system, we can use the function scipy.linalg.solve
.
The function returns a solution of the system of equations $A \mathbf{x} = \mathbf{b}$. For example:
A = np.array([[1,1],[1,-1]])
print(A)
[[ 1 1]
[ 1 -1]]
b1 = np.array([2,0])
print(b1)
[2 0]
And solve:
x1 = la.solve(A,b1)
print(x1)
[1. 1.]
Note that the output $\mathbf{x}$ is returned as a 1D NumPy array when the vector $\mathbf{b}$ (the right hand side) is entered as a 1D NumPy array. If we input $\mathbf{b}$ as a 2D NumPy array, then the output is a 2D NumPy array. For example:
A = np.array([[1,1],[1,-1]])
b2 = np.array([2,0]).reshape(2,1)
x2 = la.solve(A,b2)
print(x2)
[[1.]
[1.]]
Finally, if the right hand side $\mathbf{b}$ is a matrix, then the output is a matrix of the same size. It is the solution of $A \mathbf{x} = \mathbf{b}$ when $\mathbf{b}$ is a matrix. For example:
A = np.array([[1,1],[1,-1]])
b3 = np.array([[2,2],[0,1]])
x3 = la.solve(A,b3)
print(x3)
[[1. 1.5]
[1. 0.5]]
Simple Example
Let's compute the solution of the system of equations
\begin{align} 2x + y &= 1 \\ x + y &= 1 \end{align}
Create the matrix of coefficients:
A = np.array([[2,1],[1,1]])
print(A)
[[2 1]
[1 1]]
And the vector $\mathbf{b}$:
b = np.array([1,-1]).reshape(2,1)
print(b)
[[ 1]
[-1]]
And solve:
x = la.solve(A,b)
print(x)
[[ 2.]
[-3.]]
We can verify the solution by computing the inverse of $A$:
Ainv = la.inv(A)
print(Ainv)
[[ 1. -1.]
[-1. 2.]]
And multiply $A^{-1} \mathbf{b}$ to solve for $\mathbf{x}$:
x = Ainv @ b
print(x)
[[ 2.]
[-3.]]
We get the same result. Success!
Inverse or Solve
It's a bad idea to use the inverse $A^{-1}$ to solve $A \mathbf{x} = \mathbf{b}$ if $A$ is large. It's too computationally expensive. Let's create a large random matrix $A$ and vector $\mathbf{b}$ and compute the solution $\mathbf{x}$ in 2 ways:
N = 1000
A = np.random.rand(N,N)
b = np.random.rand(N,1)
Check the first entries $A$:
A[:3,:3]
array([[0.35754719, 0.63135432, 0.6572258 ],
[0.18450506, 0.14639832, 0.23528745],
[0.27576474, 0.46264005, 0.26589724]])
And for $\mathbf{b}$:
b[:4,:]
array([[0.82726751],
[0.96946096],
[0.31351176],
[0.63757837]])
Now we compare the speed of scipy.linalg.solve
with scipy.linalg.inv
:
%%timeit
x = la.solve(A,b)
2.77 s ± 509 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
x = la.inv(A) @ b
4.46 s ± 2.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solving with scipy.linalg.solve
is about twice as fast!
Exercises
Under construction