0% found this document useful (0 votes)

47 views67 pages

02 Linear Algebra

The document provides an introduction to linear algebra concepts used in data science, including: - Matrices are used to represent data, geometric points, systems of equations, graphs and networks. - Common matrix operations include addition, scalar multiplication, and linear combinations. Transpose, multiplication, and covariance matrices are also introduced. - Matrix multiplication can be used to represent rearrangements like permutations and connectivity in networks. It can also model linear transformations of data.

Uploaded by

emily clarke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views67 pages

02 Linear Algebra

Uploaded by

emily clarke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Introduction to Data Science

Linear Algebra

Arijit Mondal
Dept. of Computer Science & Engineering
Indian Institute of Technology Patna
arijit@iitp.ac.in

1 CS244
Matrix representation
• Matrix is every where!!
CS244

2
Matrix representation
• Matrix is every where!!
• Data — A data can be represented as n × m matrix,
• A row represents an example
• Each column represent distinct feature / dimension
• Geometric point set — A n × m matrix can denote n points in m dimension space
CS244

• Graphs & Networks — City network, chemical structure, etc.

• Rearrangements — Permutation of given set of elements

2
Geometry & Vectors
• Vectors — A 1 × d dimension matrix. In geometry sense a ray from the origin through the
given point in d dimension
• Normalization — In many scenarios the vectors are normalized to have unit norm
• Dot Product —
• Useful to reduce vector to scalar
• Can be used to measure angle
CS244

3
Matrix operations
• Addition: C = A + B, Cij = Aij + Bij
• Scalar multiplication: A′ = cA, A′ij = c · Aij
• Linear combination: αA + (1 − α)B
CS244

4 image source: Data Science Design Manual

Matrix Transpose
• Let M be a matrix and M be the transpose of M, then Mij = M′ij
T

• (AT )T = A
• Let C = A + AT hold, then Cij = Aij + Aji =
CS244

5 image source: Data Science Design Manual

Matrix Transpose
• Let M be a matrix and M be the transpose of M, then Mij = M′ij
T

• (AT )T = A
• Let C = A + AT hold, then Cij = Aij + Aji = Cji
CS244

5 image source: Data Science Design Manual

Matrix Transpose
• Let M be a matrix and M be the transpose of M, then Mij = M′ij
T

• (AT )T = A
• Let C = A + AT hold, then Cij = Aij + Aji = Cji
CS244

5 image source: Data Science Design Manual

Matrix multiplication
• It is an aggregated version of the vector dot or inner product
P
• x · y = i xi yi
• Matrix product XYT produces 1 × 1 matrix which contains dot product X · Y
X
• C = AB, Cij = Aik Bkj
k
• It does not commute, usually AB ̸= BA
• It is associative, A(BC) = (AB)C
CS244

• Consider the following matrixes: A1×n , Bn×n , Cn×n , Dn×1 .

Which of the following is better — (AB)(CD) or (A(BC))D?

6
Covariance matrix
• Multiplication by transpose matrix is common ie. A · AT
• Both A · AT and AT · A are compatible for multiplications
• Let An×d be a feature matrix, each row represents an item and each column denotes a feature
• C = AAT is a n × n matrix dot products
• Cij is a measure how similar item i is to item j (in syncness)
• D = AT A is a d × d dot products in syncness among the features
CS244

• Dij represents the similarity between feature i and feature j

Xn
• Covariance formula: Cov(X, Y) = (Xi − X̄)(Yi − Ȳ)
i=1

7
Covariance matrix (contd)
• A, A · A , A · A
T T
CS244

8 image source: Data Science Design Manual

Matrix multiplication & Paths
• Square matrix can be multiplied without transposition
• A matrix can represent the connectivity of nodes in a given network
• Let An×n can represent adjacency matrix
X
n
A2ij = Aik Akj
k=1
CS244

9
Matrix multiplication & Permutations
• Multiplication is often used to rearrange the oder of the elements in a particular matrix
• Multiplication with identity matrix (I) does not arrange anything new
• I contains exactly one non-zero element in each row and each column
• Matrix with this property is known as permutation matrix
• For example, multiplication with P(2431)

     
CS244

0 0 1 0 11 12 13 14 31 32 33 34
 1 0 0 0   21 22 23 24   11 12 13 14 
P(2431) =
 0
, M = 
 
 , PM = 
 

0 0 1 31 32 33 34 41 42 43 44 
0 1 0 0 41 42 43 44 21 22 23 24

10
Permutations Example
CS244

11 image source: Data Science Design Manual

Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
CS244

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
CS244

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
CS244

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
" # " # " #
1 3 5
Ax = ×2+ ×1=
CS244

2 1 5

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
" # " # " #
1 3 5
Ax = ×2+ ×1=
CS244

2 1 5

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
" # " # " #
1 3 5
Ax = ×2+ ×1=
CS244

2 1 5

12
Linear transformation

x2
" # " #
1 3 2
A= x=
2 1 1
" # " # " #
1 3 5
Ax = ×2+ ×1=
CS244

2 1 5

12
Rotating points in space
• Multiplying with the right matrix can rotate a set of points about the origin by angle θ

cos(θ) − sin(θ)
Rθ =
sin(θ) cos(θ)
′
x x x cos(θ) −y sin(θ)
• = R =
y′ θ
y x sin(θ) y cos(θ)
CS244

13
Identity matrix
• Identity plays a big role in algebraic structure
CS244

14
Identity matrix
• Identity plays a big role in algebraic structure
• 0 is the identity element for addition operation
CS244

14
Identity matrix
• Identity plays a big role in algebraic structure
• 0 is the identity element for addition operation
• 1 is the identity element for multiplication operation
CS244

14
Identity matrix
• Identity plays a big role in algebraic structure
• 0 is the identity element for addition operation
• 1 is the identity element for multiplication operation
• Inverse operation is about taking an element x down to its identity
• For addition operation, inverse of x is −x
CS244

−1
a b 1 d −b
−1
• for 2 × 2 matrix, A = =
c d ad − bc c a

−1
a b 1 d −b
−1
• for 2 × 2 matrix, A = =
c d ad − bc c a
• Matrix that is not invertible is known as singular matrix
• Gaussian elimination can be used to find the inverse

14
Inversion Example
• Inverse of Lincoln image and M · M−1
CS244

15 image source: Data Science Design Manual

Linear Systems, Matrix Rank
• Linear systems
• Consider the following linear equation: y = c0 + c1 x1 + · · · + cm−1 xm−1
• Thus the coeﬀicient of n such linear equations can be represented as a matrix C of size
n×m
CX = Y ⇒ X = C−1 Y
• What will happen if inverse does not exist?
• Matrix Rank
CS244

• A rank of a matrix measures the number of linearly independent rows

• Rank can be determined using Gaussian elimination

16
Factoring matrices
• Factoring matrix A into matrices in B and C represents particular aspect of division
• Non-singular matrix has an inverse I = M · M−1
• Matrix factorization is an important abstraction in data science, leading to feature represen-
tation in a compact way
• Suppose matrix A can be factored as A ≈ BC where the size of A is n × m, B — n × k, C
— k × m where k < min(n, m)
CS244

17 image source: Data Science Design Manual

Eigenvalues & Eigenvectors
• Multiplying a vector U by a matrix A can have the same effect as multiplying it by scalar λ

−5 2 2 2 −5 2 1 1
· = −6 , · = −1
2 −2 −1 −1 2 −2 2 2
• λ is eigenvalue, U is eigen vector
• Together, the eigenvector and eigenvalue must encode a lot of information about the matrix
A
• Properties
CS244

• Each eigenvalue has an associated eigenvector

• There are in general n eigenvector-eigenvalue pairs for every full rank n × n matrix
• Every pair of eigenvectors of symmetric matrix are mutually orthogonal
• Two vectors are orthogonal if the dot product is 0
• The eigenvectors can play the role of dimensions or bases in some n dimensional space

18
Example " #
y 1.25 0.75
A=
0.75 1.25
CS244

19
Example " #
y 1.25 0.75
A=
0.75 1.25
CS244