{"id":377,"date":"2024-12-05T07:03:11","date_gmt":"2024-12-05T07:03:11","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2024\/12\/05\/how-to-interpret-matrix-expressions-transformations-a5e6871cd224\/"},"modified":"2024-12-05T07:03:11","modified_gmt":"2024-12-05T07:03:11","slug":"how-to-interpret-matrix-expressions-transformations-a5e6871cd224","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2024\/12\/05\/how-to-interpret-matrix-expressions-transformations-a5e6871cd224\/","title":{"rendered":"How to Interpret Matrix Expressions\u200a\u2014\u200aTransformations"},"content":{"rendered":"<p>    How to Interpret Matrix Expressions\u200a\u2014\u200aTransformations<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h4>Matrix algebra for a data scientist<\/h4>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*57Fx9wZ-ds46JqFq\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@ballonandon?utm_source=medium&amp;utm_medium=referral\">Ben Allan<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>This article begins a series for anyone who finds matrix algebra overwhelming. My goal is to turn <em>what you\u2019re afraid of<\/em> into <em>what you\u2019re fascinated by<\/em>. You\u2019ll find it especially helpful if you want <strong>to understand machine learning concepts and\u00a0methods<\/strong>.<\/p>\n<h4><strong>Table of contents:<\/strong><\/h4>\n<ol>\n<li>Introduction<\/li>\n<li>Prerequisites<\/li>\n<li>Matrix-vector multiplication<\/li>\n<li>Transposition<\/li>\n<li>Composition of transformations<\/li>\n<li>Inverse transformation<\/li>\n<li>Non-invertible transformations<\/li>\n<li>Determinant<\/li>\n<li>Non-square matrices<\/li>\n<li>Inverse and Transpose: similarities and differences<\/li>\n<li>Translation by a\u00a0vector<\/li>\n<li>Final words<\/li>\n<\/ol>\n<h3>1. Introduction<\/h3>\n<p>You\u2019ve probably noticed that while it\u2019s easy to find materials explaining matrix computation algorithms, it\u2019s harder to find ones that teach <strong>how to interpret complex matrix expressions<\/strong>. I\u2019m addressing this gap with my series, focused on <strong>the part of matrix algebra that is most commonly used by data scientists<\/strong>.<\/p>\n<p>We\u2019ll focus more on concrete examples rather than general formulas. I\u2019d rather sacrifice generality for the sake of clarity and readability. I\u2019ll often appeal to your imagination and intuition, hoping my materials will inspire you to explore more formal resources on these topics. For precise definitions and general formulas, I\u2019d recommend you look at some good textbooks: the classic one on linear algebra\u00b9 and the other focused on machine learning\u00b2.<\/p>\n<p>This part will teach\u00a0you<\/p>\n<blockquote><p>to see a matrix as a representation of the transformation applied to\u00a0data.<\/p><\/blockquote>\n<p>Let\u2019s get started then\u200a\u2014\u200alet me take the lead through the world of matrices.<\/p>\n<h3>2. Prerequisites<\/h3>\n<p>I\u2019m guessing you can handle the expressions that\u00a0follow.<\/p>\n<p>This is <strong>the dot product<\/strong> written using a row vector and a column\u00a0vector:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AEZJjpe-Ca_-UZMzhLHyrkw.png?ssl=1\"><\/figure>\n<p><strong>A matrix<\/strong> is a rectangular array of symbols arranged in rows and columns. Here is an example of a matrix with two rows and three\u00a0columns:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AFhaXuuwdxo2zTIcnrOmmvw.png?ssl=1\"><\/figure>\n<p>You can view it as <strong>a sequence of\u00a0columns<\/strong><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AQbuf2DkAoanB15iIcru8ZA.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AMbP2blUaomHoKiYRRwoc0A.png?ssl=1\"><\/figure>\n<p>or <strong>a sequence of rows<\/strong> stacked one on top of\u00a0another:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AJaRvptlxqYhs2vJ_fEGJqg.png?ssl=1\"><\/figure>\n<p>As you can see, I used superscripts for rows and subscripts for columns. In machine learning, it\u2019s important to clearly distinguish between observations, represented as vectors, and features, which are arranged in\u00a0rows.<\/p>\n<p>Other interesting ways to represent this matrix are <strong>A<\/strong><em>\u2082<\/em>\u2093<em>\u2083 <\/em>and <strong>A<\/strong>[<em>a\u1d62<\/em>\u207d<em>\u02b2\u00a0<\/em>\u207e].<\/p>\n<p><strong>Multiplying<\/strong> two matrices <strong>A <\/strong>and <strong>B <\/strong>results in a third matrix <strong>C <\/strong>= <strong>AB<\/strong> containing the scalar products of each row of <strong>A <\/strong>with each column of <strong>B<\/strong>, arranged accordingly. Below is an example for <strong>C<\/strong><em>\u2082<\/em>\u2093<em>\u2082<\/em><strong> <\/strong>= <strong>A<\/strong><em>\u2082<\/em>\u2093<em>\u2083<\/em><strong>B<\/strong><em>\u2083<\/em>\u2093<em>\u2082.<\/em><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5ChtiiLBMCSRu770ChVGkA.png?ssl=1\"><\/figure>\n<p>where c<em>\u1d62<\/em>\u207d<em>\u02b2 <\/em>\u207e is the scalar product of the <em>i<\/em>-th column of the matrix <strong>B<\/strong> and the <em>j<\/em>-th row of matrix\u00a0<strong>A<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A4QClUtix68I2z3Dr291luA.png?ssl=1\"><\/figure>\n<p>Note that this definition of multiplication requires the number of rows of <em>the left matrix<\/em> to match the number of columns of <em>the right matrix<\/em>. In other words, <strong>the inner dimensions of the matrices must\u00a0match<\/strong>.<\/p>\n<p>Make sure you can manually multiply matrices with arbitrary entries. You can use the following code to check the result or to practice multiplying matrices.<\/p>\n<pre>import numpy as np<br><br># Matrices to be multiplied<br>A = [<br>    [ 1, 0, 2],<br>    [-2, 1, 1]<br>]<br><br>B = [<br>    [ 0, 3, 1],<br>    [-3, 1, 1],<br>    [-2, 2, 1]<br>]<br><br># Convert to numpy array<br>A = np.array(A)<br>B = np.array(B)<br><br># Multiply A by B (if possible)<br>try:<br>    C = A @ B<br>    print(f'A B = n{C}n')<br>except:<br>    print(\"\"\"ValueError:<br>The number of rows in matrix A does not match <br>the number of columns in matrix B<br>\"\"\")<br><br>#  and in the reverse order, B by A (if possible)<br>try:<br>    D = B @ A<br>    print(f'B A =n{D}')<br>except:<br>    print(\"\"\"ValueError:<br>The number of rows in matrix B does not match <br>the number of columns in matrix A<br>\"\"\")<\/pre>\n<pre>A B = <br>[[-4  7]<br> [-5 -3]]<br><br>B A =<br>[[-6  3  3]<br> [-5  1 -5]<br> [-6  2 -2]]<\/pre>\n<h3>3. Matrix-vector multiplication<\/h3>\n<p>In this section, I will explain the effect of matrix multiplication on vectors. The vector <strong>x<\/strong> is multiplied by the matrix <strong>A<\/strong>, producing a new vector\u00a0<strong>y<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AozDrOJvawV8hJnfRVV1lNA.png?ssl=1\"><\/figure>\n<p>This is a common operation in data science, as it enables <strong>a linear transformation of data<\/strong>. The use of matrices to represent linear transformations is highly advantageous, as you will soon see in the following examples.<\/p>\n<p>Below, you can see your grid space and your standard basis vectors: blue for the <em>x<\/em>\u207d\u00b9\u207e direction and magenta for the <em>x<\/em>\u207d\u00b2\u207e direction.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AUhFmLhFLHnuC28sMGasT8w.png?ssl=1\"><figcaption>Standard basis in a Grid\u00a0Space<\/figcaption><\/figure>\n<p>A good starting point is to work with transformations that map two-dimensional vectors <strong>x<\/strong> into two-dimensional vectors <strong>y<\/strong> in the same grid\u00a0space.<\/p>\n<blockquote><p>Describing the desired transformation is a simple trick. You just need to say how the coordinates of the basis vectors change after the transformation and use these new coordinates as the columns of the matrix\u00a0<strong>A.<\/strong>\n<\/p><\/blockquote>\n<p>As an example, consider a linear transformation that produces the effect illustrated below. The standard basis vectors are drawn lightly, while the transformed vectors are shown more\u00a0clearly.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AgLmw2TcmgaPXSQUjf8PMWg.png?ssl=1\"><figcaption>Standard basis transformed by matrix\u00a0<strong>A<\/strong><\/figcaption><\/figure>\n<p>From the comparison of the basis vectors before and after the transformation, you can observe that the transformation involves a 45-degree counterclockwise rotation about the origin, along with an elongation of the\u00a0vectors.<\/p>\n<p>This effect can be achieved using the matrix <strong>A<\/strong>, composed as\u00a0follows:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AoxVObeo1hmb9s5QRjhJ5AQ.png?ssl=1\"><\/figure>\n<p>The first column of the matrix contains the coordinates of the first basis vector after the transformation, and the second column contains those of the second basis\u00a0vector.<\/p>\n<p>The equation (1) then takes the\u00a0form<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A1eR1hdtG7FMFRc9TArhvuw.png?ssl=1\"><\/figure>\n<p>Let\u2019s take two example points <strong>x<\/strong>\u2081and <strong>x<\/strong>\u2082\u00a0:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AgDhnZiCTRv7BZg3imR8mqQ.png?ssl=1\"><\/figure>\n<p>and transform them into the vectors <strong>y<\/strong>\u2081\u200b and <strong>y<\/strong>\u2082\u00a0:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5A7vMKfJHPfXMt-_Ehu0KA.png?ssl=1\"><\/figure>\n<p>I encourage you to do these calculations by hand first, and then switch to using a program like\u00a0this:<\/p>\n<pre>import numpy as np<br><br># Transformation matrix<br>A = np.array([<br>    [1, -1],<br>    [1,  1]<br>])<br><br># Points (vectors) to be transformed using matrix A<br>points = [<br>    np.array([1, 1\/2]),<br>    np.array([-1\/4, 5\/4])<br>]<br><br># Print out the transformed points (vectors)<br>for i, x in enumerate(points):<br>    y = A @ x<br>    print(f'y_{i} = {y}')<\/pre>\n<pre>y_0 = [0.5 1.5]<br>y_1 = [-1.5  1. ]<\/pre>\n<p>The plot below shows the\u00a0results.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AbGgoW-nTpAVBqOYUl2BUsw.png?ssl=1\"><figcaption>Points transformed by matrix\u00a0<strong>A<\/strong><\/figcaption><\/figure>\n<p>The <strong>x<\/strong> points are gray\u00a0and\u00a0smaller, while their transformed counterparts <strong>y<\/strong> have black edges\u00a0and\u00a0are\u00a0bigger. If you\u2019d prefer to think of these points as arrowheads, here\u2019s the corresponding illustration:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A7TA3I-dmrCdgCvbp9Gvraw.png?ssl=1\"><figcaption>Vectors transformed by matrix\u00a0<strong>A<\/strong><\/figcaption><\/figure>\n<p>Now you can see more clearly that the points have been rotated around the origin and pushed a little\u00a0away.<\/p>\n<p>Let\u2019s examine another\u00a0matrix:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A6jjW2j7gXTPkLcI5JNOAgw.png?ssl=1\"><\/figure>\n<p>and see how the transformation<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AKA92sPAt4MtpjAyYIC4nIg.png?ssl=1\"><\/figure>\n<p>affects the points on the grid\u00a0lines:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AV4ExyXEOXf6BK9zuIZaicQ.png?ssl=1\"><figcaption>Grid lines transformed by matrix\u00a0<strong>B<\/strong><\/figcaption><\/figure>\n<p>Compare the result with that obtained using <strong>B<\/strong>\/2, which corresponds to dividing all elements of the matrix <strong>B<\/strong> by\u00a02:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AFFDcpD5v07TstcvCCD9Pog.png?ssl=1\"><figcaption>Grid lines transformed by matrix\u00a0<strong>B<\/strong>\/2<\/figcaption><\/figure>\n<p>In general, a linear transformation:<\/p>\n<ul>\n<li>ensures that straight lines remain straight,<\/li>\n<li>keeps parallel lines parallel,<\/li>\n<li>scales the distances between them by a uniform\u00a0factor.<\/li>\n<\/ul>\n<p>To keep things concise, I\u2019ll use \u2018<em>transformation <\/em><strong><em>A<\/em><\/strong>\u2018 throughout the text instead of the full phrase \u2018<em>transformation represented by matrix\u00a0<\/em><strong><em>A<\/em><\/strong>\u2019.<\/p>\n<p>Let\u2019s return to the\u00a0matrix<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A6jjW2j7gXTPkLcI5JNOAgw.png?ssl=1\"><\/figure>\n<p>and apply the transformation to a few sample\u00a0points.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ARW0xa1Vwtb8g_skReutz9g.png?ssl=1\"><figcaption>The effects of transformation <strong>B<\/strong> on various input\u00a0vectors<\/figcaption><\/figure>\n<p>Notice the following:<\/p>\n<ul>\n<li>point <strong>x<\/strong>\u2081\u200b has been rotated counterclockwise and brought closer to the\u00a0origin,<\/li>\n<li>point <strong>x<\/strong>\u2082\u200b, on the other hand, has been rotated clockwise and pushed away from the\u00a0origin,<\/li>\n<li>point <strong>x<\/strong>\u2083\u200b has only been scaled down, meaning it\u2019s moved closer to the origin while keeping its direction,<\/li>\n<li>point <strong>x<\/strong>\u2084 has undergone a similar transformation, but has been scaled\u00a0up.<\/li>\n<\/ul>\n<p>The transformation compresses in the <em>x<\/em>\u207d\u00b9\u207e-direction and stretches in the <em>x<\/em>\u207d\u00b2\u207e-direction. You can think of the grid lines as behaving like an accordion.<\/p>\n<p>Directions such as those represented by the vectors <strong>x<\/strong>\u2083 and <strong>x<\/strong>\u2084 play an important role in machine learning, but that\u2019s a story for another\u00a0time.<\/p>\n<p>For now, we can call them <strong><em>eigen-directions<\/em><\/strong>, because vectors along these directions might only be scaled by the transformation, without being rotated. Every transformation, except for rotations, has its own set of eigen-directions.<\/p>\n<h3>4. Transposition<\/h3>\n<p>Recall that the transformation matrix is constructed by stacking the transformed basis vectors in columns. Perhaps you\u2019d like to see what happens if we <strong>swap the rows and columns<\/strong> afterwards (the transposition).<\/p>\n<p>Let us take, for example, the\u00a0matrix<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AjNsRF2B5TGfS99qjd-bWJA.png?ssl=1\"><\/figure>\n<p>where <strong>A<\/strong>\u1d40 stands for the transposed matrix.<\/p>\n<p>From a geometric perspective, <em>the coordinates of the first<\/em> new basis vector come from <em>the first coordinates of all<\/em> the old basis vectors, the second from the second coordinates, and so\u00a0on.<\/p>\n<p>In NumPy, it\u2019s as simple as\u00a0that:<\/p>\n<pre>import numpy as np<br><br>A = np.array([<br>    [1, -1],<br>    [1 , 1]<br>    ])<br><br>print(f'A transposed:n{A.T}')<\/pre>\n<pre>A transposed:<br>[[ 1  1]<br> [-1  1]]<\/pre>\n<p>I must disappoint you now, as I cannot provide a simple rule that expresses the relationship between the transformations <strong>A<\/strong> and <strong>A<\/strong>\u1d40 in just a few\u00a0words.<\/p>\n<p>Instead, let me show you a property shared by both the original and transposed transformations, which will come in handy\u00a0later.<\/p>\n<p>Here is the geometric interpretation of the transformation represented by the matrix <strong>A<\/strong>. The area shaded in gray is called <strong>the parallelogram<\/strong>.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5fUU7d7iV2PFC5g1oyc6Sw.png?ssl=1\"><figcaption>Parallelogram spanned by the basis vectors transformed by matrix\u00a0<strong>A<\/strong><\/figcaption><\/figure>\n<p>Compare this with the transformation obtained by applying the matrix\u00a0<strong>A<\/strong>\u1d40:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_WsE7AkmpjID29RP3ykgwQ.png?ssl=1\"><figcaption>Parallelogram spanned by the basis vectors transformed by matrix\u00a0<strong>A<\/strong>\u1d40<\/figcaption><\/figure>\n<p>Now, let us consider another transformation that applies entirely different scales to the unit\u00a0vectors:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Ajw9jY8mm7R0uAQ_H_5ulDA.png?ssl=1\"><\/figure>\n<p>The parallelogram associated with the matrix <strong>B<\/strong> is much narrower\u00a0now:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AdtVQfsbsF0-u1qIu4QGQIQ.png?ssl=1\"><figcaption>Parallelogram spanned by the basis vectors transformed by matrix\u00a0<strong>B<\/strong><\/figcaption><\/figure>\n<p>but it turns out that it is the same size as that for the matrix\u00a0<strong>B<\/strong>\u1d40:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AViAAZisJ-I-lL0CItlUzoQ.png?ssl=1\"><figcaption>Parallelogram spanned by the basis vectors transformed by matrix\u00a0<strong>B<\/strong>\u1d40<\/figcaption><\/figure>\n<p>Let me put it this way: you have a set of numbers to assign to the components of your vectors. If you assign a larger number to one component, you\u2019ll need to use smaller numbers for the others. In other words, the total length of the vectors that make up the parallelogram stays the same. I know this reasoning is a bit vague, so if you\u2019re looking for more rigorous proofs, check the literature in the references section.<\/p>\n<p>And here\u2019s the kicker at the end of this section: the area of the parallelograms can be found by calculating <strong>the determinant<\/strong> of the matrix. What\u2019s more, <em>the determinant of the matrix and its transpose are identical.<\/em><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A1rlJHShHX_RCriBz-vEzSw.png?ssl=1\"><\/figure>\n<p>More on the determinant in the upcoming sections.<\/p>\n<h3>5. Composition of transformations<\/h3>\n<p>You can apply a sequence of transformations\u200a\u2014\u200afor example, start by applying <strong>A<\/strong> to the vector <strong>x<\/strong>, and then pass the result through <strong>B<\/strong>. This can be done by first multiplying the vector <strong>x <\/strong>by the matrix <strong>A<\/strong>, and then multiplying the result by the matrix\u00a0<strong>B<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AQlSiFeHzuh2kblRa9AOMTA.png?ssl=1\"><\/figure>\n<p>You can multiply the matrices <strong>B <\/strong>and <strong>A <\/strong>to obtain the matrix <strong>C<\/strong> for further\u00a0use:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AedOo4oS7qlXfJwvAhSuezw.png?ssl=1\"><\/figure>\n<p>This is the effect of the transformation represented by the matrix\u00a0<strong>C<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Avw8-k6-imP46lDSUecKBTg.png?ssl=1\"><figcaption>Transformation described by the composite matrix\u00a0<strong>BA<\/strong><\/figcaption><\/figure>\n<p>You can perform the transformations in reverse order: first apply <strong>B<\/strong>, then apply\u00a0<strong>A<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Agu0GCOmhTuBRZ4h4bSSRhA.png?ssl=1\"><\/figure>\n<p>Let <strong>D<\/strong> represent the sequence of multiplications performed in this\u00a0order:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AD9J3xOj28SqypdiEYjMaoA.png?ssl=1\"><\/figure>\n<p>And this is how it affects the grid\u00a0lines:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ABBfKXUuVm0J268FIZL6yxQ.png?ssl=1\"><figcaption>Transformation described by the composite matrix\u00a0<strong>AB<\/strong><\/figcaption><\/figure>\n<p>So, you can see for yourself that <strong>the order of matrix multiplication matters<\/strong>.<\/p>\n<p>There\u2019s a cool property with <strong>the transpose of a composite transformation<\/strong>. Check out what happens when we multiply <strong>A <\/strong>by\u00a0<strong>B<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A5ChtiiLBMCSRu770ChVGkA.png?ssl=1\"><\/figure>\n<p>and then transpose the result, which means we\u2019ll apply\u00a0(<strong>AB<\/strong>)\u1d40:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AICWKPywsopTrYJwslM_s8w.png?ssl=1\"><\/figure>\n<p>You can easily extend this observation to the following rule:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Ao5Yq2EWfffwrrUqk7TpLQg.png?ssl=1\"><\/figure>\n<p>To finish off this section, consider the inverse problem: is it possible to recover matrices <strong>A<\/strong> and <strong>B<\/strong> given only <strong>C <\/strong>=\u00a0<strong>AB<\/strong>?<\/p>\n<p>This is <strong>matrix factorization<\/strong>, which, as you might expect, doesn\u2019t have a unique solution. Matrix factorization is a powerful technique that can provide insight into transformations, as they may be expressed as a composition of simpler, elementary transformations. But that\u2019s a topic for another\u00a0time.<\/p>\n<h3>6. Inverse transformation<\/h3>\n<p>You can easily construct a matrix representing a <strong>do-nothing transformation<\/strong> that leaves the standard basis vectors unchanged:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AP_f-Gha50m-o5RLF-VBw6Q.png?ssl=1\"><\/figure>\n<p>It is commonly referred to as <strong>the identity\u00a0matrix<\/strong>.<\/p>\n<p>Take a matrix <strong>A<\/strong> and consider the transformation that undoes its effects. The matrix representing this transformation is <strong>A<\/strong>\u207b\u00b9. Specifically, when applied after or before <strong>A<\/strong>, it yields the identity matrix\u00a0<strong>I<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AeuWd0sl3KFAAywo2AkKE2A.png?ssl=1\"><\/figure>\n<p>There are many resources that explain how to calculate the inverse by hand. I recommend learning <a href=\"https:\/\/www.mathsisfun.com\/algebra\/matrix-inverse-row-operations-gauss-jordan.html\">Gauss-Jordan method<\/a> because it involves simple row manipulations on the augmented matrix. At each step, you can swap two rows, rescale any row, or add to a selected row a weighted sum of the remaining rows.<\/p>\n<p>Take the following matrix as an example for hand calculations:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2At71Pc8v18V88WhaxdHmp1w.png?ssl=1\"><\/figure>\n<p>You should get the inverse\u00a0matrix:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AxGsMqICk4gpWM_AT-I-l7g.png?ssl=1\"><\/figure>\n<p>Verify by hand that equation (4) holds. You can also do this in\u00a0NumPy.<\/p>\n<pre>import numpy as np<br><br>A = np.array([<br>    [1, -1],<br>    [1 , 1]<br>    ])<br><br>print(f'Inverse of A:n{np.linalg.inv(A)}')<\/pre>\n<pre>Inverse of A:<br>[[ 0.5  0.5]<br> [-0.5  0.5]]<\/pre>\n<p>Take a look at how the two transformations differ in the illustrations below.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AOTRv2d4MfyxArQvo0Yc4pg.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong><\/figcaption><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A-zfWNVHMSQO0wUibUydLmw.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong>\u207b\u00b9<\/figcaption><\/figure>\n<p>At first glance, it\u2019s not obvious that one transformation reverses the effects of the\u00a0other.<\/p>\n<p>However, in these plots, you might notice a fascinating and far-reaching <strong>connection between the transformation and its\u00a0inverse<\/strong>.<\/p>\n<p>Take a close look at the first illustration, which shows the effect of transformation <strong>A<\/strong> on the basis vectors. The original unit vectors are depicted semi-transparently, while their transformed counterparts, resulting from multiplication by matrix <strong>A<\/strong>, are drawn clearly and solidly. Now, imagine that these newly drawn vectors are the basis vectors you use to describe the space, and you perceive the original space from their perspective. Then, the original basis vectors will appear smaller and, secondly, will be oriented towards the east. And this is exactly what the second illustration shows, demonstrating the effect of the transformation <strong>A<\/strong>\u207b\u00b9.<\/p>\n<p>This is a preview of an upcoming topic I\u2019ll cover in the next article about <em>using matrices to represent different perspectives on\u00a0data<\/em>.<\/p>\n<p>All of this sounds great, but there\u2019s a catch: <strong>some transformations can\u2019t be reversed<\/strong>.<\/p>\n<h3>7. Non-invertible transformations<\/h3>\n<p>The workhorse of the next experiment will be the matrix with 1s on the diagonal and <em>b<\/em> on the antidiagonal:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Aj0AVtGlKRpfM-sERpFlX0w.png?ssl=1\"><\/figure>\n<p>where <em>b<\/em> is a fraction in the interval (0, 1). This matrix is, by definition, symmetrical, as it happens to be identical to its own transpose: <strong>A<\/strong>=<strong>A<\/strong>\u1d40, but I\u2019m just mentioning this by the way; it\u2019s not particularly relevant\u00a0here.<\/p>\n<p>Invert this matrix using the Gauss-Jordan method, and you will get the following:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Au-sL-bt-5bZt056aZmdFlw.png?ssl=1\"><\/figure>\n<p>You can easily find online the rules for calculating the determinant of 2&#215;2 matrices, which will\u00a0give<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AKB7PRUA1BvmSZOKA5FXLNQ.png?ssl=1\"><\/figure>\n<p>This is no coincidence. In general, it holds\u00a0that<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2An3R7yQvP8mUmQZcYwqZHEA.png?ssl=1\"><\/figure>\n<p>Notice that when <em>b<\/em> = 0, the two matrices are identical. This is no surprise, as <strong>A<\/strong> reduces to the identity matrix\u00a0<strong>I<\/strong>.<\/p>\n<p>Things get tricky when <em>b <\/em>= 1, as the det(<strong>A) <\/strong>= 0 and det<strong>(A<\/strong>\u207b\u00b9) becomes infinite. As a result, <strong>A<\/strong>\u207b\u00b9 does not exist for a matrix <strong>A<\/strong> consisting entirely of 1s. In algebra classes, teachers often warn you about a zero determinant. However, when we consider where the matrix comes from, it becomes apparent that an infinite determinant can also occur, resulting in <em>a fatal error<\/em>.\u00a0Anyway,<\/p>\n<blockquote><p>a zero determinant means the transformation is non-ivertible.<\/p><\/blockquote>\n<p>Now, the stage is set for experiments with different values of <em>b<\/em>. We\u2019ve just seen how calculations fail at the limits, so let\u2019s now visually investigate what happens as we carefully approach\u00a0them.<\/p>\n<p>We start with <em>b <\/em>= \u00bd\u200b and end up near\u00a01.<\/p>\n<p>Step 1)<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AT_64p59e_pKyKkG2SJgL4Q.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AbCLA1Aun47RFnK9gPNaQIA.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong><\/figcaption><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AkUptuu69L5s5fDvOgyOjRA.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong>\u207b\u00b9<\/figcaption><\/figure>\n<p>Step 2)<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AuFJYvcxRRQfxR90053sVlg.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AH05C27h3OoRl_Z4s8e0vmQ.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong><\/figcaption><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A-ab7enCENR3MxbIr4l0aVw.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong>\u207b\u00b9<\/figcaption><\/figure>\n<p>Recall that <strong>the determinant of the matrix representing the transformation corresponds to the area of the parallelogram<\/strong> formed by the transformed basis\u00a0vectors.<\/p>\n<p>This is in line with the illustrations: the smaller the area of the parallelogram for transformation <strong>A<\/strong>, the larger it becomes for transformation <strong>A<\/strong>\u207b\u00b9. What follows is: the narrower the basis for transformation <strong>A<\/strong>, the wider it is for its inverse. Note also that I had to extend the range on the axes because the basis vectors for transformation <strong>A<\/strong> are getting\u00a0longer.<\/p>\n<p>By the way, notice\u00a0that<\/p>\n<blockquote><p>the transformation <strong>A<\/strong> has the same eigen-directions as\u00a0<strong>A<\/strong>\u207b\u00b9.<\/p><\/blockquote>\n<p>Step 3) <em>Almost\u00a0there\u2026<\/em><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AYL88GpqhQ6nP9CSXshnFsQ.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Af9ho8NlEe3QtP_ndWWw81A.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong><\/figcaption><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AkiJZ_fSk-8a7Q9PSZ9Q3Lg.png?ssl=1\"><figcaption>Transformation <strong>A<\/strong>\u207b\u00b9<\/figcaption><\/figure>\n<p>The gridlines are squeezed so much that they almost overlap, which eventually happens when <em>b<\/em> hits 1. The basis vectors of are stretched so far that they go beyond the axis limits. When <em>b<\/em> reaches exactly 1, both basis vectors lie on the same\u00a0line.<\/p>\n<p>Having seen the previous illustrations, you\u2019re now ready to guess the effect of applying a non-invertible transformation to the vectors. Take a moment to think it through first, then either try running a computational experiment or check out the results I\u2019ve provided\u00a0below.<\/p>\n<p>.<\/p>\n<p>.<\/p>\n<p>.<\/p>\n<p>Think of it this\u00a0way.<\/p>\n<p>When the basis vectors are not parallel, meaning they form an angle other than 0 or 180 degrees, you can use them to address any point on the entire plane (mathematicians say that the vectors <strong><em>span <\/em><\/strong>the plane). Otherwise, the entire plane can no longer be spanned, and only points along the line covered by the basis vectors can be addressed.<\/p>\n<p>.<\/p>\n<p>.<\/p>\n<p>.<\/p>\n<p>This is what it looks like when you apply the non-invertible transformation to randomly selected\u00a0points:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AbweUieUouNwgTejqF3wDLg.png?ssl=1\"><figcaption>A non-invertible matrix <strong>A<\/strong> reduces the dimensionality of the\u00a0data<\/figcaption><\/figure>\n<p>A consequence of applying a non-invertible transformation is that the two-dimensional space collapses to a one-dimensional subspace. After the transformation, it is no longer possible to uniquely recover the original coordinates of the\u00a0points.<\/p>\n<p>Take a look at the entries of matrix <strong>A<\/strong>. When <em>b<\/em> = 1, both columns (and rows) are identical, implying that the transformation matrix effectively behaves as if it were a 1 by 2 matrix, mapping two-dimensional vectors to a\u00a0scalar.<\/p>\n<p>You can easily verify that the problem would be the same if one row were a multiple of the other. This can be further generalized for matrices of any dimensions: if any row can be expressed as a weighted sum (<em>linear combination<\/em>) of the others, it implies that a dimension collapses. The reason is that such a vector lies within the space spanned by the other vectors, so it does not provide any additional ability to address points beyond those that can already be addressed. You may consider this vector <strong><em>redundant<\/em><\/strong>.<\/p>\n<p>From section 4 on transposition, we can infer that <strong>if there are redundant rows, there must be an equal number of redundant columns<\/strong>.<\/p>\n<h3>8. Determinant<\/h3>\n<p>You might now ask if there\u2019s a non-geometrical way to verify whether the columns or rows of the matrix are redundant.<\/p>\n<p>Recall the parallelograms from Section 4 and the scalar quantity known as the determinant. I mentioned that<\/p>\n<blockquote><p>the determinant of a matrix indicates how the area of a unit parallelogram changes under the transformation.<\/p><\/blockquote>\n<p>The exact definition of the determinant is somewhat tricky, but as you\u2019ve already seen, its graphical interpretation should not cause any problems.<\/p>\n<p>I will demonstrate the behavior of two transformations represented by matrices:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AGG9_PB8u6kKyPnJ_9_wSkQ.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_l8NNTtL1z90gAq4teTexw.png?ssl=1\"><figcaption>det(<strong>A<\/strong>) =\u00a02<\/figcaption><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AH7lq36yOjCrHjF7EiVsgzA.png?ssl=1\"><figcaption>det(<strong>B<\/strong>) =\u00a0-3\/4<\/figcaption><\/figure>\n<p>The magnitude of the determinant indicates how much the transformation stretches (if greater than 1) or shrinks (if less than 1) the space overall. While the transformation may stretch along one direction and compress along another, the overall effect is given by the value of the determinant.<\/p>\n<p>Also, a negative determinant indicates a reflection; note that matrix <strong>B<\/strong> reverses the order of the basis\u00a0vectors.<\/p>\n<p>A parallelogram with zero area corresponds to a transformation that collapses a dimension, meaning <strong>the determinant can be used to test for redundancy in the basis vectors of a\u00a0matrix<\/strong>.<\/p>\n<p>Since the determinant measures the area of a parallelogram under a transformation, we can apply it to a sequence of transformations. If det(<strong>A<\/strong>) and det(<strong>B<\/strong>) represent the scaling factors of unit areas for transformations <strong>A<\/strong> and <strong>B<\/strong>, then the scaling factor for the unit area after applying both transformations sequentially, that is, <strong>AB<\/strong>, is equal to det(<strong>AB<\/strong>). As both transformations act independently and one after the other, the total effect is given by det(<strong>AB<\/strong>) = det(<strong>A<\/strong>) det(<strong>B<\/strong>). Substituting matrix <strong>A<\/strong>\u207b\u00b9 for matrix <strong>B <\/strong>and noting that det(<strong>I<\/strong>) = 1 leads to equation (5) introduced in the previous\u00a0section.<\/p>\n<p>Here\u2019s how you can calculate the determinant using\u00a0NumPy:<\/p>\n<pre>import numpy as np<br><br>A = np.array([<br>    [-1\/2, 1\/4],<br>    [2, 1\/2]<br>    ])<br><br>print(f'det(A) = {np.linalg.det(A)}')<\/pre>\n<pre>det(A) = -0.75<\/pre>\n<h3>9. Non-square matrices<\/h3>\n<p>Until now, we\u2019ve focused on square matrices, and you\u2019ve developed a geometric intuition of the transformations they represent. Now is a great time to expand these skills to <strong>matrices with any number of rows and\u00a0columns<\/strong>.<\/p>\n<h4>Wide matrices<\/h4>\n<p>This is an example of <strong>a wide matrix<\/strong>, which has more columns than\u00a0rows:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Adk-065tymz9ROS7cYTATsA.png?ssl=1\"><\/figure>\n<p>From the perspective of equation (1), <strong>y<\/strong> = <strong>Ax<\/strong>, it maps three-dimensional vectors <strong>x<\/strong> to two-dimensional vectors\u00a0<strong>y<\/strong>.<\/p>\n<p>In such a case, one column can always be expressed as a multiple of another or as a weighted sum of the others. For example, the third column here equals 3\/4 times the first column plus 5\/4 times the\u00a0second.<\/p>\n<p>Once the vector <strong>x<\/strong> has been transformed into <strong>y<\/strong>, it\u2019s no longer possible to reconstruct the original <strong>x<\/strong> from <strong>y<\/strong>. We say that the transformation <strong>reduces the dimensionality of the input data<\/strong>. These types of transformations are very important in machine learning.<\/p>\n<p>Sometimes, a wide matrix disguises itself as a square matrix, but you can reveal it by checking whether its determinant is zero. We\u2019ve had this situation before, remember?<\/p>\n<p>We can use the matrix <strong>A<\/strong> to create two different square matrices. Try deriving the following result yourself:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AquRfcEdg7diUmUY5uMdj2w.png?ssl=1\"><\/figure>\n<p>and also determinants (I recommend simplified formulas for working with <a href=\"https:\/\/brilliant.org\/wiki\/expansion-of-determinants\/\">2\u00d72<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rule_of_Sarrus\">3\u00d73<\/a> matrices):<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AZDYa_0cFfC9Ktg2OtggRuQ.png?ssl=1\"><\/figure>\n<p>The matrix <strong>A<\/strong>\u1d40<strong>A<\/strong> is composed of the dot products of all possible pairs of columns from matrix <strong>A<\/strong>, some of which are definitely redundant, thereby transferring this redundancy to\u00a0<strong>A<\/strong>\u1d40<strong>A<\/strong>.<\/p>\n<p>Matrix <strong>AA<\/strong>\u1d40, on the other hand, contains only the dot products of the rows of matrix <strong>A<\/strong>, which are fewer in number than the columns. Therefore, the vectors that make up matrix <strong>AA<\/strong>\u1d40 are most likely (though not entirely guaranteed) linearly independent, meaning that one vector cannot be expressed as a multiple of another or as a weighted sum of the\u00a0others.<\/p>\n<p>What would happen if you insisted on determining <strong>x<\/strong> from <strong>y<\/strong>, which was previously computed as <strong>y<\/strong> = <strong>Ax<\/strong>? You could left-multiply both sides by <strong>A<\/strong>\u207b\u00b9 to get equation <strong>A<\/strong>\u207b\u00b9<strong>y<\/strong> = <strong>A<\/strong>\u207b\u00b9<strong>Ax<\/strong> and, since <strong>A<\/strong>\u207b\u00b9<strong>A = I<\/strong>, obtain <strong>x<\/strong> = <strong>A<\/strong>\u207b\u00b9<strong>y<\/strong>. But this would fail from the very beginning, because matrix <strong>A<\/strong>\u207b\u00b9, being non-square, is certainly non-invertible (at least not in the sense that was previously introduced).<\/p>\n<p>However, you can extend the original equation <strong>y<\/strong> = <strong>Ax <\/strong>to include a square matrix where it\u2019s needed. You just need to left-multiply matrix <strong>A<\/strong>\u1d40 on both sides of the equation, yielding <strong>A<\/strong>\u1d40<strong>y<\/strong> = <strong>A<\/strong>\u1d40<strong>Ax<\/strong>. On the right, we now have a square matrix <strong>A<\/strong>\u1d40<strong>A<\/strong>. Unfortunately, we\u2019ve already seen that its determinant is zero, so it appears that we have once again failed to reconstruct <strong>x<\/strong> from\u00a0<strong>y<\/strong>.<\/p>\n<h4>Tall matrices<\/h4>\n<p>Here is an example of a <strong>tall\u00a0matrix<\/strong><\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AOmPHmE6E0OFHhbEvMXZKqg.png?ssl=1\"><\/figure>\n<p>that maps two-dimensional vectors <strong>x <\/strong>into three-dimensional vectors <strong>y<\/strong>. I made a third row by simply squaring the entries of the first row. While this type of extension doesn\u2019t add any new information to the data, it can surprisingly improve the performance of certain machine learning\u00a0models.<\/p>\n<p>You might think that, unlike wide matrices, tall matrices allow the reconstruction of the original <strong>x<\/strong> from <strong>y<\/strong>, where <strong>y<\/strong> = <strong>Bx<\/strong>, since no information is discarded\u200a\u2014\u200aonly\u00a0added.<\/p>\n<p>And you\u2019d be right! Look at what happens when we left-multiply by matrix <strong>B<\/strong>\u1d40, just like we tried before, but without success: <strong>B<\/strong>\u1d40<strong>y<\/strong> = <strong>B<\/strong>\u1d40<strong>Bx<\/strong>. This time, matrix <strong>B<\/strong>\u1d40<strong>B<\/strong> is invertible, so we can left-multiply by its\u00a0inverse:<\/p>\n<p><strong>(B<\/strong>\u1d40<strong>B)<\/strong>\u207b\u00b9<strong>B<\/strong>\u1d40<strong>y<\/strong> = <strong>(B<\/strong>\u1d40<strong>B)<\/strong>\u207b\u00b9<strong>(B<\/strong>\u1d40<strong>B)x<\/strong><\/p>\n<p>and finally\u00a0obtain:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AB5JZF-nDmJMyyy1zSNrpJQ.png?ssl=1\"><\/figure>\n<p>This is how it works in\u00a0Python:<\/p>\n<pre>import numpy as np<br><br># Tall matrix<br>B = [<br>    [2, -3],<br>    [1 , 0],<br>    [3, -3]<br>]<br><br># Convert to numpy array<br>B = np.array(B)<br><br># A column vector from a lower-dimensional space<br>x = np.array([-3,1]).reshape(2,-1)<br><br># Calculate its corresponding vector in a higher-dimensional space<br>y = B @ x<br><br>reconstructed_x = np.linalg.inv(B.T @ B) @ B.T @ y<br><br>print(reconstructed_x)<\/pre>\n<pre>[[-3.]<br> [ 1.]]<\/pre>\n<p>To summarize: the determinant measures the redundancy (or linear independence) of the columns and rows of a matrix. However, it only makes sense when applied to square matrices. Non-square matrices represent transformations between spaces of different dimensions and necessarily have linearly dependent columns or rows. If the target dimension is higher than the input dimension, it\u2019s possible to reconstruct lower-dimensional vectors from higher-dimensional ones.<\/p>\n<h3>10. Inverse and Transpose: similarities and differences<\/h3>\n<p>You\u2019ve certainly noticed that the inverse and transpose operations play a key role in matrix algebra. In this section, we bring together the most useful identities related to these operations.<\/p>\n<p>Whenever I apply the inverse operator, I assume that the matrix being operated on is\u00a0square.<\/p>\n<p>We\u2019ll start with the obvious one that hasn\u2019t appeared\u00a0yet.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Ahad_eyWjowCYo7WLJDnGiA.png?ssl=1\"><\/figure>\n<p>Here are the previously given identities (2) and (5), placed side by\u00a0side:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2Am-ehBK-Vk5jbhPRj_kWe7A.png?ssl=1\"><\/figure>\n<p>Let\u2019s walk through the following reasoning, starting with the identity from equation (4), where <strong>A<\/strong> is replaced by the composite <strong>AB<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A-7ZomFe42oREnHzAAFRz7Q.png?ssl=1\"><\/figure>\n<p>The parentheses on the right are not needed. After removing them, I right-multiply both sides by the matrix <strong>B<\/strong>\u207b\u00b9 and then by\u00a0<strong>A<\/strong>\u207b\u00b9.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A0RuG3AVsxkr9L-6rLLVnww.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AAvCzfaP2FM_vCvi4wkOFmA.png?ssl=1\"><\/figure>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AqFaif1ZAcL-vFC9OpwKRZw.png?ssl=1\"><\/figure>\n<p>Thus, we observe the next similarity between inversion and transposition (see equation\u00a0(3)):<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A4RYRdRfw91fi8FDo6MSKqg.png?ssl=1\"><\/figure>\n<p>You might be disappointed now, as the following only applies to transposition.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AP8-6nvHXbKegYM8rkZkNBg.png?ssl=1\"><\/figure>\n<p>But imagine if <strong>A<\/strong> and <strong>B<\/strong> were scalars. The same for the inverse would be a mathematical scandal!<\/p>\n<p>For a change, the identity in equation (4) works only for the\u00a0inverse:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2APSUCikhyzsVfgsb6nYtzlA.png?ssl=1\"><\/figure>\n<p>I\u2019ll finish off this section by discussing the interplay between inversion and transposition.<\/p>\n<p>From the last equation, along with equation (3), we get the following:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AwfIgNiuFhuE1PFfuHSv0_A.png?ssl=1\"><\/figure>\n<p>Keep in mind that <strong>I<\/strong>\u1d40 = <strong>I<\/strong>. Right-multiplying by the inverse of <strong>A<\/strong>\u1d40 yields the following identity:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2A_dPTqwGlxshw5swFsH51fw.png?ssl=1\"><\/figure>\n<h3>11. Translation by a\u00a0vector<\/h3>\n<p>You might be wondering why I\u2019m focusing only on the operation of multiplying a vector by a matrix, while neglecting the translation of a vector by adding another\u00a0vector.<\/p>\n<p>One reason is purely mathematical. Linear operations offer significant advantages, such as ease of transformation, simplicity of expressions, and algorithmic efficiency.<\/p>\n<p>A key property of linear operations is that a linear combination of inputs leads to a linear combination of\u00a0outputs:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2App4zA9l-0902vWyaeNYRvQ.png?ssl=1\"><\/figure>\n<p>where <em>\u03b1<\/em>\u00a0<em>, \u03b2<\/em> are real scalars, and <em>Lin <\/em>represents a linear operation.<\/p>\n<p>Let\u2019s first examine the matrix-vector multiplication operator <em>Lin<\/em>[<strong>x<\/strong>] = <strong>Ax<\/strong> from equation\u00a0(1):<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AyzfhRBXzLZC_uA9kw8cdVw.png?ssl=1\"><\/figure>\n<p>This confirms that matrix-vector multiplication is a linear operation.<\/p>\n<p>Now, let\u2019s consider a more general transformation, which involves a shift by a vector\u00a0<strong>b<\/strong>:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AshneBIccfcUFmeLZjgHBDg.png?ssl=1\"><\/figure>\n<p>Plug in a weighted sum and see what comes\u00a0out.<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2ALgZjH1T_9ClerrfAWiZFWA.png?ssl=1\"><\/figure>\n<p>You can see that adding <strong>b<\/strong> disrupts the linearity. Operations like this are called <strong>affine <\/strong>to differentiate them from linear\u00a0ones.<\/p>\n<p>Don\u2019t worry though\u200a\u2014\u200athere\u2019s a simple way to eliminate the need for translation. Simply shift the data beforehand, for example, by centering it, so that the vector <strong>b<\/strong> becomes zero. This is a common approach in data\u00a0science.<\/p>\n<p>Therefore, the data scientist only needs to worry about matrix-vector multiplication.<\/p>\n<h3>12. Final\u00a0words<\/h3>\n<p>I hope that linear algebra seems easier to understand now, and that you\u2019ve got a sense of how interesting it can\u00a0be.<\/p>\n<p>If I\u2019ve sparked your interest in learning more, that\u2019s great! But even if it\u2019s just that you feel more confident with the course material, that\u2019s still a\u00a0win.<\/p>\n<p>Bear in mind that this is more of a semi-formal introduction to the subject. For more rigorous definitions and proofs, you might need to look at specialised literature.<\/p>\n<p><em>Unless otherwise noted, all images are by the\u00a0author<\/em><\/p>\n<h3>References<\/h3>\n<p>[1] Gilbert Strang. <em>Introduction to linear algebra<\/em>. Wellesley-Cambridge Press,\u00a02022.<\/p>\n<p>[2] Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong. <em>Mathematics for machine learning<\/em>. Cambridge University Press,\u00a02020.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=a5e6871cd224\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/how-to-interpret-matrix-expressions-transformations-a5e6871cd224\">How to Interpret Matrix Expressions\u200a\u2014\u200aTransformations<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Jaroslaw Drapala<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-interpret-matrix-expressions-transformations-a5e6871cd224\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to Interpret Matrix Expressions\u200a\u2014\u200aTransformations Matrix algebra for a data scientist Photo by Ben Allan on\u00a0Unsplash This article begins a series for anyone who finds matrix algebra overwhelming. My goal is to turn what you\u2019re afraid of into what you\u2019re fascinated by. You\u2019ll find it especially helpful if you want to understand machine learning concepts [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,442,441,311,443,440],"tags":[445,419,444],"class_list":["post-377","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-transformation","category-determinants","category-getting-started","category-matrix-linear-algebra","category-transposition","tag-expressions","tag-matrix","tag-rows"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/377"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=377"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/377\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=377"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=377"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=377"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}