We can’t use the same cost function for logistic regression as for linear regression. To find a better cost function, we’ll exploit the fact that the model output approaches 0 or 1 and the outputs in the training set are always either 0 (indicating the input is not in the class) or 1 (indicating the input is in the class). We want a cost function with the following properties:

If the model output is near the training output, the cost should near 0.
If the model output is near 1 when the training output is 0, the cost should be very large.
If the model output is near 0 when the training output is 1, the cost should be equally large.

The following cost function has all three of these properties. It looks quite long, but you can see that the left addend becomes zero when the training output \(y^{(i)} = 0\) and the right addend becomes zero when \(y^{(i)} = 1\).

\[J(\theta_0, \theta_1) = -\frac{1}{m}\bigg[\sum_{i=1}^{m}{y^{(i)}ln(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}}) + (1 - y^{(i)})ln(1 - \frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}})}\bigg]\] \[\frac{\partial J(\theta_0, \theta_1)}{\partial \theta_0} = \sum_{i=1}^m{(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}} - y^{(i)})}\] \[\frac{\partial J(\theta_0, \theta_1)}{\partial \theta_1} = \sum_{i=1}^m{(\frac{1}{1 + e^{\theta_0 + \theta_1x^{(i)}}} - y^{(i)})x^{(i)}}\]

Summary

Click the arrows to see different ways of expressing the same thing: