You saw that the cost function for linear regression generates a surface with large flat regions when it is applied to logistic regression. Such a cost function is not suitable for gradient descent.

The improved cost function is:

The three-dimensional graph below shows the values of the improved cost function for the same dataset and values of $\theta_0$ and $\theta_1$.

You can click and drag to rotate the graph, scroll to zoom in and out, and hover over the data points in the graph to see each value of $\theta_0$, $\theta_1$, and $J$.

As you can see, for a logistic model, the improved cost function generates a surface with single “fold.” The process of gradient descent can quickly and easily find the values of $\theta_1$ and $\theta_2$ that minimize this function.