You saw that the cost function for linear regression generates a surface with large flat regions when it is applied to logistic regression. Such a cost function is not suitable for gradient descent.

The improved cost function is:

\[J(\theta_0, \theta_1) = -\frac{1}{m}\bigg[\sum_{i=1}^{m}{y^{(i)}ln(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}}) + (1 - y^{(i)})ln(1 - \frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}})}\bigg]\]

The three-dimensional graph below shows the values of the improved cost function for the same dataset and values of \(\theta_0\) and \(\theta_1\).

You can click and drag to rotate the graph, scroll to zoom in and out, and hover over the data points in the graph to see each value of \(\theta_0\), \(\theta_1\), and \(J\).

As you can see, for a logistic model, the improved cost function generates a surface with single “fold.” The process of gradient descent can quickly and easily find the values of \(\theta_1\) and \(\theta_2\) that minimize this function.