For simple linear regression, it is possible to minimize the cost function by setting its partial derivative equations to zero and then solving for the variables $\theta_0$ and $\theta_1$.

In modern machine learning, the computer learns vastly more complex, non-linear models. (You’ll encounter them later.) These models sometimes describe the relationships between millions of inputs and thousands of outputs. To learn such models, we can still define a cost function and find its partial derivative equations. However, setting them to zero and solving for the variables may be impossible or excessively time-consuming.

The alternative method to minimize the cost function is gradient descent.

If you tried to minimize the linear regression cost function interactively, you already have a good sense for gradient descent. You may have moved the sliders rapidly until the line came close to the data set. During this time, the cost function decreased rapidly. Then, you could make fine adjustments, paying close attention to the value of $J(\theta_0, \theta_1)$.