In simple linear regression the computer learns a linear relationship between a single input $x$ and a single output $y$ by calculating two values $\theta_0$ and $\theta_1$. These values define a line that best fits the training examples $(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \dots, (x^{(m)}, y^{(m)})$.

Multivariate linear regression is an extension that finds a relationship between multiple inputs $x_1, x_2, \dots, x_n$ and an output. We say that the input is “$n$ dimensional.” The computer calculates $n + 1$ values $\theta_0, \theta_1, ..., \theta_n$. These values define a plane (if $n = 2$) or hyperplane (if $n > 2$) that best fits the training examples.

To handle the multiplicity of variables and values, it is convenient to use matrices.

Let $$\vec{\theta} =$$ [[\theta_0], [\theta_1], [\vdots], [\theta_n]]. Its transpose $\vec{\theta}^T =$ [\theta_0, \theta_1, …, \theta_n]

Let the inputs be represented as $\vec{x} =$ [[x_0 = 1], [x_1], [\vdots], [x_n]]. (Below you’ll see why we set $x_0 = 1$.)

Instead of $y(x) = \theta_0 + \theta_1x$ as for simple linear regression, we use the rules of matrix multiplication to get the model equation:

The cost function can also be expressed using matrix notation:

The partial derivatives are as follows:

The model equation, cost function, and its partial derivatives should look familiar. In the special case where $n = 1$ they are exactly the equations for simple linear regression.