# Even Less Linear Regression

In simple linear regression the computer
learns a linear relationship between a single input and a single output
by calculating two values and . These values
define a *line* that best fits the training examples.

**Polynomial regression** finds a
*non-linear* relationship between an input and the output. The model equation for polynomial regression contains terms that raise the
input value to various powers. For example:

The computer calculates values . These coefficients define a curve that best fits the training examples.

In the multiple-dimensional input case, the model equation looks the same. However, includes products of inputs and inputs raised to various powers as well as actual inputs.

In this case, we refer to as a **feature vector** because it
consists of actual inputs as well as “features” of the inputs.

You may be wondering: Which inputs are multiplied together? Which are raised to which powers? This is a matter of discretion and experimentation. If you know that the output depends on the product of two inputs or on an input raised to a power, you should use that feature. However, the more features you include, the more iterations your model will require to converge.

The cost function and its partial derivatives are similar to those for multivariate linear regression:

Therefore, gradient descent can be used to find the best-fit surface.

# Overfitting

You can start with a linear model and gradually add features to and dimensions to until you get a sufficiently complex model to fit the training exaples.

If you include a large number of features, you are likely to encounter the
problem of **overfitting**. The model fits the training data well, but gives
poor or incorrect outputs when fed inputs outside the training data. This problem is common in machine learning.

For this reason, when given a data set, it is best not to use the entire set
for training. Instead, reserve a portion of it (called the **testing set**) for
testing the model after it has been trained. Later, we will look at a couple of
methods for preventing overfitting.