The Python program below uses a multi-layer perceptron to classify images from the MNIST dataset. The program should run in a few minutes and achieve an accuracy of just under 97%. It has a few notable differences from the previous program that only used logistic regression.
This program has parameters
feature_theta0 for the
feature detection models as well as
output_theta0 for the
Feature detection happens with logistic regression models operating on raw inputs:
features = 1.0 / (1.0 + tf.exp(tf.matmul(example_input, feature_thetas) + feature_theta0))
Classification happens with a softmax model operating on features:
model_output = tf.nn.softmax(tf.matmul(features, output_thetas) + output_theta0)
Number of Features
The model implemented by this program detects 500 features of the input. The number 500 is called a hyperparameter of the model (in contrast to the parameters ). The choice of hyperparameter values is a matter of experience and experimentation. Generally, if the number of features is smaller than the number of inputs, then the features are a “compressed” representation of the input.
The previous program initialized
theta0 values to zero. For
multi-layer perceptrons, this is not a good idea. When their parameter values
are exactly the same, the feature detection models are all detecting the same
feature! Their gradients will be the same and they will be updated by the same
amounts, rendering them largely redundant. Researchers have searched for good
initial values of , and
VariableInitializerRandom uses random
values in a range recommended by Yoshua Bengio and Xavier Glorot in their
AISTATS 2010 paper, Understanding the difficulty of training deep feedforward
neural networks. To
convince yourself of the value of careful initialization, replace the
You will notice a decline in accuracy.