Many useful problems involve classifying an input in one of multiple classes, for example,

• Is an e-mail spam, work-related, or personal?

• Which numeral between 0 and 9 is shown in an image?

• Which of several thousand words was uttered in a clip of audio?

You might be tempted to handle such problems using a model whose output is an integer identifying the class. However, a more straightforward approach is to have a logistic model for each class whose output indicates the probability that the input is in that class. For a given input, the model that produces the highest probability corresponds to the predicted class.

It is a simple matter to divide each output by the sum of the outputs. This ensures that the sum of the probabilities is 1.0. This combination of outputs is a predicted probability distribution across classes.

In this case, the training data consists of examples $(\vec{x}^{(1)}, \vec{y}^{(1)}), (\vec{x}^{(2)}, \vec{y}^{(2)}), \dots, (\vec{x}^{(m)}, \vec{y}^{(m)})$ where each $\vec{y}^{(i)}$ is the target probability distribution—a vector with a 1 corresponding to the expected class and zeroes elsewhere (conventionally called a one-hot vector).

We can use matrices to keep track of the multiple models. Let $$\vec{\theta} =$$  [[\theta_0^{(1)}, \theta_0^{(2)}, …], [\theta_1^{(1)}, \theta_1^{(2)}, …]] with a column for each class. Then the model equation can be compactly expressed as before:

The matrices and operations will become clearer in the computer code, but

• $\theta^T$ is a matrix with a row for each class and a column for each input or feature.

• $\theta^T \times \vec{x}$ is a matrix multiplication operation resulting in a row for each class.

• $e^{\vec{\theta^T}\times\vec{x}}$ is an elementwise operation, resulting in a row for each class.

• $\vec{1}$ is a column vector with a 1 for each class, and $\vec{1} + e^{\vec{\theta^T}\times\vec{x}}$ is a matrix addition operation, resulting in a row for each class.

• $\frac{1}{\vec{1} + e^{\vec{\theta^T}\times\vec{x}}}$ is also an elementwise operation, resulting in a row for each class.