Suppose, as in linear regression, we try to minimize the sum of half the square of the vertical distance between each example and the logistic curve:

The three-dimensional graph below shows the values of this cost function for a simple dataset and different values of $\theta_0$ and $\theta_1$.

You can click and drag to rotate the graph, scroll to zoom in and out, and hover over the data points in the graph to see each value of $\theta_0$, $\theta_1$, and $J$.

As you can see, for a logistic model, this cost function generates a surface with large flat regions. The process of gradient descent can easily get “stuck” in these regions. If you try to minimize this cost function for a logistic model, you’ll be lucky if it converges at all, much less in a reasonable number of iterations.