The last time I studied math seriously was in a Linear Algebra module in university 4 years ago. Recently taking a course on NLP and Deep Learning, hence, I’m documenting the math concepts I found useful.
Jacobian Matrix
The Jacobian matrix is essentially a way to capture how a function scales, rotates, or distorts space locally around a point $x$ when it maps from one set of variables to another. Essentially, it is just a matrix where each entry is a partial derivative of an output with respect to an input. Each row corresponds to an output variable, and each column corresponds to an input variable.
For a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ (where you have multiple input and output variables), the Jacobian matrix at a point $x$ contains all the first partial derivatives of each output with respect to each input. It’s structured as:
$$J(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \ \vdots & \vdots & \ddots & \vdots \ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}$$
Scaling factor near $x$: The determinant of the Jacobian matrix (when it’s square, i.e., $m = n$) tells you the local “scaling factor” of volume around the point $x$. This is especially useful in cases where you’re dealing with changes of variables in integrals, where the Jacobian determinant helps adjust for volume changes in transformed coordinates.
Geometric interpretation: In cases where $f$ represents a transformation (e.g., stretching, compressing, or rotating space), the Jacobian describes how much space near $( x )$ is being scaled or distorted in each direction.
In simpler cases, the Jacobian matrix simplifies as follow:
Single input, single output $(f: \mathbb{R} \rightarrow \mathbb{R})$: The Jacobian matrix is a $1 \times 1$ matrix, which is just the derivative $( \frac{df}{dx} )$ of the function.
Multiple inputs, single output $(f: \mathbb{R}^n \rightarrow \mathbb{R})$: The Jacobian matrix is a $1 \times n$ row vector containing the partial derivatives of the output with respect to each input. It looks like:
$$J(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix}$$
This setup is often called the gradient of $f$, which shows the rate of change of $f$ with respect to each input direction.
Multiple inputs, multiple outputs $(f: \mathbb{R}^n \rightarrow \mathbb{R}^m)$: Here, the Jacobian is a full $m \times n$ matrix, with each row representing the gradient of each output component with respect to the inputs.
The Jacobian’s dimensions adapt based on the function’s input and output dimensions, giving a compact way to represent all partial derivatives in one matrix structure.