A matrix is a two-dimensional collection of numbers, a list of lists, where all inner lists have the same size and represent a row of the matrix. Structurally, matrices and tables are alike.
Per mathematical convention, capital letters are used to represent matrices. For example, below is matrix D, which has 6 rows and 3 columns.
D = [
[70, 170, 80],
[65, 172, 76],
[43, 168, 63],
[41, 176, 58],
[38, 181, 70],
[36, 177, 67]
]
In the context of Data Science, matrices are important for many reasons, such as the ones briefly described below.
1. A matrix can be used to represent a collection of vectors. In the matrix D above, each row represents a person and each column represents a feature of this person: age in years, height in cm and weight in kg.
2. Another use of matrices is to represent a linear function that maps k-dimensional vectors to n-dimensional vectors. As far as we can understand, this is relevant in the context of dimensionality reduction.
3. Matrices can be used to represent binary relationships, like in the example below.
friendships = [
[0, 1, 1, 0, 0, 0], # user 0
[1, 0, 1, 1, 0, 0], # user 1
[1, 1, 0, 1, 0, 0], # user 2
[0, 1, 1, 0, 1, 0], # user 3
[0, 0, 0, 1, 0, 1], # user 4
[0, 0, 0, 0, 1, 0] # user 5
#u0 u1 u2 u3 u4 u5 (users)
]
In the matrix above, each row represents a user and also each column represents a user.
friendships[0][2] == 1 # True
(or in plain English, user 0 and user 2 are friends).
The line of code above represents the value of the cell located in row 0 and column 2 (the third column). And, in this specific case (representation of friendship), the value for row 2 (third row) and column 0 (friendships[2][0]
) should also be true. (And all other pairs that are valued 1, which makes the matrix symmetrical along the diagonal that starts in [0, 0] and ends in [5, 5]. The diagonal itself is filled with zeros, which also makes sense, as users cannot be friends of themselves.)
Just to clarify, not all kinds of binary relationships are represented in a matrix that is symmetrical – users and posts, representing likes for example, are not. User likes a post, but the post does not like the user back.
Reference: Grus, Joel. “Data Science from Scratch”. O’Reilly. 2015