Making Sense of Matrix Multiplication

September 17, 2024

Making Sense of Matrix Multiplication

Let's start our journey with something simple yet important – matrix multiplication.

Matrices are basically the foundation of linear algebra. And the first important thing we learn about matrices is the process of multiplying them with each other (not counting the trivial operations of adding & subtracting matrices as well as multiplying them with scalars, which are technically learned first).

But the moment we are taught how to multiply two matrices, it strikes us as something odd (or it did to me, at least). Why such a convoluted procedure of multiplying each element of each row of one matrix to each element of each column of the other matrix and then summing the products to get each element of the product matrix? Phew. Why not just multiply each element with the corresponding element of the other matrix?

Yet these questions don't really get answered, and we put them out of our minds. We just memorise the procedure till it becomes a habit and continue with our lives, without ever figuring out the why behind what we're doing. Well, no more!

Today, we are going to look at matrix multiplication from a different perspective. Note that what I am going to describe here isn't something novel or unique. If it were, I'd be writing a research paper rather than a blog on it, no? It's just something that I wasn't exactly taught when I was learning about matrices, but helped me a lot once I figured it out. Perhaps it's actually something taught quite commonly and my experience was just the exception. Whatever the case, I am writing about it here so that it can prove useful for those who were not aware of it like myself.

——————————

Let's say we have the following matrices:

\(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) and \(B = \begin{pmatrix} e & f \\ g & h \end{pmatrix}\)

Their product will naturally be:

\(C = AB = \begin{pmatrix} ae + bg & af + bh \\ ce + dg & cf + dh \end{pmatrix}\)

What we will be doing, is looking at the multiplication of matrices in terms of contributions. What does that mean? Let's analyse this with respect to the matrix \(A\) to start with. The thing to keep in mind is:

Each element (say \(j\)) of a given row (say \(i\)) of matrix \(A\) represents the contribution to that row (i.e. row \(i\)) of matrix \(C\), of the row of matrix \(B\) corresponding to that element (i.e. \(j\)).

So, the first row of matrix \(B\) contributes \(a\) units to the first row of matrix \(C\), i.e.

\(a \times \begin{pmatrix} e & f \end{pmatrix} = \begin{pmatrix} ae & af \end{pmatrix}\)

Similarly, the second row of matrix \(B\) contributes \(b\) units to the first row of matrix \(C\), i.e.

\(b \times \begin{pmatrix} g & h \end{pmatrix} = \begin{pmatrix} bg & bh \end{pmatrix}\)

When both contributions are summed, we get the first row of matrix \(C\), i.e.

\(\begin{pmatrix} ae & af \end{pmatrix} + \begin{pmatrix} bg & bh \end{pmatrix} = \begin{pmatrix} ae + bg & af + bh \end{pmatrix}\)

And the second row of matrix \(C\) is obtained in a similar manner, with the contributions of the first and second row of matrix \(B\) being the elements of the second row of matrix \(A\), i.e. \(c\) and \(d\) respectively.

So, this tells us what the elements of a matrix that is left-multiplied to a certain matrix represent, as was the case with matrix \(A\) above. But what about a matrix that is right-multiplied with a given matrix (like matrix \(B\) in the above example)? What contributions do its elements refer to? Well, think about it this way:

Each element (say \(i\)) of a given column (say \(j\)) of matrix \(B\) represents the contribution to that column (i.e. \(j\)) of matrix \(C\), of the column of matrix \(A\) corresponding to that element (i.e. \(i\)).

Therefore, the first column of matrix \(A\) contributes \(e\) units to the first column of matrix \(C\), i.e.

\(e \times \begin{pmatrix} a \\ c \end{pmatrix} = \begin{pmatrix} ae \\ ce \end{pmatrix}\)

Similarly, the second column of matrix \(A\) contributes \(g\) units to the first column of matrix \(C\), i.e.

\(g \times \begin{pmatrix} b \\ d \end{pmatrix} = \begin{pmatrix} bg \\ dg \end{pmatrix}\)

When both contributions are summed, we get the first column of matrix \(C\), i.e.

\(\begin{pmatrix} ae \\ ce \end{pmatrix} + \begin{pmatrix} bg \\ dg \end{pmatrix} = \begin{pmatrix} ae + bg \\ ce + dg \end{pmatrix}\)

The second column of matrix \(C\) is once again obtained in a similar manner, with the contributions of the first and second column of matrix \(A\) being the elements of the second column of matrix \(B\), i.e. \(f\) and \(h\) respectively.

——————————

Up till this point, it might look like what we've discussed above is quite obvious and not that different from how we typically look at matrix multiplication. As it happens, it is rather obvious, yes, but this subtle depth in how we think about multiplying matrices is going to make our life easier in comparison with just mindlessly trudging through the algorithm. How? We're going to see just that. But before we move forward, if you have just glanced through the above section nodding your head as you read along, please go back and actually go through the steps outlined there. Otherwise, you might not really get what benefit we're supposed to be getting by following this method in the examples below.

Okay, so what benefit does this perspective offer us? It makes computations especially easier when dealing with matrices containing many zeroes (i.e. in the cases when at least one of the two matrices being multiplied has this characteristic). The effect is observable even with a few zeroes, actually. And there is some benefit even in the absence of zeroes, as long as the numbers involved are nice ones like ones (pun intended) and twos and tens and so on. Let's try an example. Say we have to multiply the following matrices:

\(\begin{pmatrix} 2 & 5 & 7 & 1 \\ 5 & 4 & 10 & 3 \\ 1 & -5 & 20 & 7 \\ 9 & 4 & -2 & 0 \end{pmatrix}\) and \(\begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 1 & -1 & 0 \\ 0 & 0 & -1 \end{pmatrix}\)

Seems complicated, doesn't it? But with the technique we just discussed, it's almost trivial. We'll work with respect to the contributions specified by the matrix on the right, as it contains more zeroes. As the first column of the right matrix has 1's at the first and third positions, the first column of the product matrix has equal (1 unit) contributions from the first and third columns of the left matrix, i.e. the first column of the product matrix is:

\(\begin{pmatrix} 2 \\ 5 \\ 1 \\ 9 \end{pmatrix} + \begin{pmatrix} 7 \\ 10 \\ 20 \\ -2 \end{pmatrix} = \begin{pmatrix} 9 \\ 15 \\ 21 \\ 7 \end{pmatrix}\)

Next, because the second column of the right matrix has a 2 at the second position and a -1 at the third position, the second column of the product matrix is 2 times the second column of the left matrix plus -1 times (or, minus 1 times) the third column of the left matrix, i.e.

\(2 \times \begin{pmatrix} 5 \\ 4 \\ -5 \\ 4 \end{pmatrix} + (-1) \times \begin{pmatrix} 7 \\ 10 \\ 20 \\ -2 \end{pmatrix} = \begin{pmatrix} 3 \\ -2 \\ -30 \\ 10 \end{pmatrix}\)

Finally, because the third column of the right matrix has a -1 only at the fourth position, the third column of the product matrix is the same as the fourth column of the left matrix, just with its signs flipped:

\((-1) \times \begin{pmatrix} 1 \\ 3 \\ 7 \\ 0 \end{pmatrix} = \begin{pmatrix} -1 \\ -3 \\ -7 \\ 0 \end{pmatrix}\)

This gives us the product matrix:

\(\begin{pmatrix} 9 & 3 & -1 \\ 15 & -2 & -3 \\ 21 & -30 & -7 \\ 7 & 10 & 0 \end{pmatrix}\)

I am sure by now you can see, at least in principle, how much more efficient computing matrix products in this way is. Although, to realise those gains practically, you need to do some practice of multiplying matrices using this method (even ones that do not have any zeroes and thus do not provide any significant savings of time and effort, just to develop the required muscle memory, of sorts).

Anyways, let's try another example, this time using the left matrix as the focus of our attention. Say, we have to multiply:

\(\begin{pmatrix} 1 & 0 & 1 \\ 0 & 2 & 0 \end{pmatrix}\) and \(\begin{pmatrix} 3 & 5 & 4 \\ 7 & 7 & 1 \\ 9 & 8 & 2 \end{pmatrix}\)

The first row of the product matrix will be the sum of the first and third rows of the right matrix (because the first row of the left matrix has 1's at its first and third positions):

\(\begin{pmatrix} 3 & 5 & 4 \end{pmatrix} + \begin{pmatrix} 9 & 8 & 2 \end{pmatrix} = \begin{pmatrix} 12 & 13 & 6 \end{pmatrix}\)

Whereas the second row of the product matrix will be 2 times the second row of the right matrix (because the second row of the left matrix has a 2 at its second positions):

\(2 \times \begin{pmatrix} 7 & 7 & 1 \end{pmatrix} = \begin{pmatrix} 14 & 14 & 2 \end{pmatrix}\)

Giving us the product matrix:

\(\begin{pmatrix} 12 & 13 & 6 \\ 14 & 14 & 2 \end{pmatrix}\)

Before we conclude our discussion, it is pertinent to mention that the benefits of this method become the most pronounced when one of the matrices we are dealing with is a rearrangement of the identity matrix. Let's say we have to compute the following product:

\(\begin{pmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{pmatrix}\) and \(\begin{pmatrix} 1 & 9 & 12 & -5 \\ 43 & -28 & 7 & 4\\ 1 & -1 & 25 & -2 \\ 0 & 3 & 7 & 13 \end{pmatrix}\)

As soon as we observe that the left matrix is just the identity matrix with its first row swapped with the third, and the second with the fourth, we know that the product will be the same as the right matrix, just with the first and third rows swapped, as well as the second and fourth rows swapped. How? Because each row (say \(i\)) of the matrix on the left has only one 1, and the position of that 1 represents the row of the right matrix that will become the corresponding row (i.e. \(i\)) of the product matrix. (And a similar logic would be used for columns in case the rearranged identity matrix were on the right.) Thus, we get the product:

\(\begin{pmatrix} 1 & -1 & 25 & -2 \\ 0 & 3 & 7 & 13 \\ 1 & 9 & 12 & -5 \\ 43 & -28 & 7 & 4 \end{pmatrix}\)

——————————

In a subsequent discussion, we will see another advantage of this approach to matrix multiplication which comes up when working with linear transformations. Be sure to practice the method in the meantime, and stay tuned for more on math and computer science.

~ ASD0x41, signing out.

Search This Blog

NullPrime

Making Sense of Matrix Multiplication

Comments

Popular Posts

About Me & This Blog