In simple linear regression the computer learns a linear relationship between a single input $$x$$ and a single output $$y$$ by calculating two values $$\theta_0$$ and $$\theta_1$$. These values define a line that best fits the training examples $$(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \dots, (x^{(m)}, y^{(m)})$$.

Multivariate linear regression is an extension that finds a relationship between multiple inputs $$x_1, x_2, \dots, x_n$$ and an output. We say that the input is “$$n$$ dimensional.” The computer calculates $$n + 1$$ values $$\theta_0, \theta_1, ..., \theta_n$$. These values define a plane (if $$n = 2$$) or hyperplane (if $$n > 2$$) that best fits the training examples.

To handle the multiplicity of variables and values, it is convenient to use matrices.

Let $$\vec{\theta} =$$ [[\theta_0], [\theta_1], [\vdots], [\theta_n]]. Its transpose $$\vec{\theta}^T =$$ [\theta_0, \theta_1, …, \theta_n]

Let the inputs be represented as $$\vec{x} =$$ [[x_0 = 1], [x_1], [\vdots], [x_n]]. (Below you’ll see why we set $$x_0 = 1$$.)

Instead of $$y(x) = \theta_0 + \theta_1x$$ as for simple linear regression, we use the rules of matrix multiplication to get the model equation:

$y(\vec{x}) = \vec{\theta}^T \times \vec{x} = \theta_0 + \theta_1x_1 + \theta_2x_2 + \dots + \theta_nx_n$

The cost function can also be expressed using matrix notation:

$J(\vec{\theta}) = \sum_{i=1}^m{\frac{1}{2}(\vec{\theta}^T \times \vec{x} - y^{(i)})^2}$

The partial derivatives are as follows:

$\frac{\partial J(\vec{\theta})}{\partial \theta_0} = \sum_{i=1}^m(\vec{\theta}^T \times \vec{x} - y^{(i)})$ $\frac{\partial J(\vec{\theta})}{\partial \theta_1} = \sum_{i=1}^m(\vec{\theta}^T \times \vec{x} - y^{(i)})x_1^{(i)}$ $\frac{\partial J(\vec{\theta})}{\partial \theta_2} = \sum_{i=1}^m(\vec{\theta}^T \times \vec{x} - y^{(i)})x_2^{(i)}$ $\vdots$ $\frac{\partial J(\vec{\theta})}{\partial \theta_n} = \sum_{i=1}^m(\vec{\theta}^T \times \vec{x} - y^{(i)})x_n^{(i)}$

The model equation, cost function, and its partial derivatives should look familiar. In the special case where $$n = 1$$ they are exactly the equations for simple linear regression.