Automatic Differentiation with TensorFlow
You may have noticed a recurring pattern in how we have approached machine learning problems:
-
We assume a model of appropriate complexity, having parameters \(\vec{\theta}\).
-
We define a cost function that is minimized when the \(\vec{\theta}\) values best fit the model to the training data.
-
We find the partial derivative functions of the cost function, with respect to the parameters \(\vec{\theta}\).
-
Starting with some initial values of the parameters, we calculate the gradients (the values of the partial derivative functions) and use the gradients to update the parameters.
As we look at models of increasing complexity, with increasing numbers of input variables and parameters, it becomes difficult to find the partial derivatives of cost functions with respect to the parameters. Fortunately, the open source community has produced fast, efficient libraries that can automatically generate partial derivative functions from the definition of a complex cost function. One of these is maintained by Google engineers and it is called TensorFlow.
The power of TensorFlow is rooted in your ability to represent a cost function (or any other function) as an operation graph. An operation graph decomposes a complex function into simpler operations. Nodes represent values or operators. Vertices represent the flow of values into operations and the flow of results out of operations.
As you can imagine, from such a representation of a cost function, it is possible automatically to derive the graph representations of partial derivative functions. The representation also yields another important benefit: For a large, complex function composed of numerous simpler operations, it becomes possible to distribute those operations across multiple computers. Furthermore, in a single computer, specific operations can be assigned to a GPU or other application-specific processor. The result is fast, parallel execution of the function.
Unfortunately, this power comes at a cost to the data scientist: instead of simply writing a function as, say, Python or Julia code and then executing that code, you must write code that first represents the function as a graph and then executes the graph. In the next post, we’ll see how this is done.