If you are building deep learning models, you may need to sit for hours (or even days) before you can see any real results. You may need to stop model training to change the learning rate, push training logs to the database for future use, or show the training progress in TensorBoard. We may need to do a lot of work to achieve these basic tasks—that’s where TensorFlow callbacks come into the picture.
Callbacks are special functions executed at specific stages during the training process. They can help you prevent overfitting, visualize training progress, debug your code, save checkpoints, generate logs, and create TensorBoard visualizations, among other tasks. TensorFlow offers many built-in callbacks, and you can use multiple callbacks concurrently. In this discussion, we will explore the various callbacks available and provide examples of how to use them.
Callbacks are called when a certain event is triggered. There are a few types of events during training that can lead to the trigger of a callback, such as:
To use any callback in the model training, you just need to pass the callback object in the model.fit()
call, for example:
Let’s take a look at the callbacks which are available under the tf.keras.callbacks
module.
This callback is used very often. This allows us to monitor our metrics and stop model training when it stops improving. For example, assume that you want to stop training if the accuracy is not improving by 0.05; you can use this callback. This is useful in preventing the overfitting of a model to some extent.
The EarlyStopping callback is executed via the on_epoch_end trigger for training.
This callback allows us to save the model periodically during training. It’s particularly beneficial for deep learning models that require a significant amount of time to train. The callback monitors the training process and saves model checkpoints at regular intervals, based on specific metrics.
The ModelCheckpoint callback is executed via the on_epoch_end trigger of training.
This is one of the best callbacks if you want to visualize your model’s training summary. This callback generates the logs for TensorBoard, which you can later launch to visualize your training progress. We will cover the details of TensorBoard in a separate article.
For now we will see only one parameter, log_dir, which is the path of the folder where you need to store the logs. To launch the TensorBoard you need to execute the following command:
tensorboard --logdir=path_to_your_logs
You can launch the TensorBoard before or after starting your training.
TensorBoard
The TensorBoard callback is also triggered at on_epoch_end.
This callback is handy when the user wants to update the learning rate as training progresses. For instance, you may want to decrease the learning rate after a certain number of epochs. The LearningRateScheduler will let you do exactly that.
Below is an example of reducing the learning rate after three epochs.
Function to pass to the ‘schedule’ parameter for the LearningRateScheduler callback. As you can see in the output below, after the fourth epoch, the learning rate has been reduced. verbose has been set to 1 to keep tabs on the learning rate.
In epoch 5 learning rate drops to 0.0002 from 0.002.
This callback is also triggered at on_epoch_end.
As the name suggests, this callback logs the training details in a CSV file. The logged parameters are epoch, accuracy, loss, val_accuracy, and val_loss. One thing to keep in mind is that you need to pass accuracy as a metric while compiling the model. Otherwise, you will get an execution error.
The logger accepts the filename and separator and appends them as parameters. append variable defines whether or not to append to an existing file or write in a new file instead. The CSVLogger callback is executed via the on_epoch_end trigger of training. When an epoch ends, the logs are put into a file.
This callback is required when you need to call some custom function on any of the events, but the provided callbacks do not suffice. For instance, say you want to put your logs into a database.
The tf.keras.callbacks.LambdaCallback
is a flexible Keras callback that allows users to define custom actions at specific points during training by passing lambda functions to its parameters. It provides hooks for different stages of training, including on_epoch_begin
and on_epoch_end
for executing actions at the start and end of each epoch, on_batch_begin
and on_batch_end
for actions at the start and end of each batch, and on_train_begin
and on_train_end
for operations before and after the entire training process. This callback is particularly useful for logging, modifying training behavior dynamically, or implementing custom monitoring functions without defining a full-fledged callback class. The **kwargs
parameter allows passing additional arguments, making it highly customizable.
Let’s see an example:
Function to put logs in a file at end of a batch.
This callback will put the logs into a file after a batch is processed. The output which you can see in the file is:
The generated Logs
This callback is called for all the events, and executes the custom functions based on the parameters passed.
This callback is used to change the learning rate when the metrics have stopped improving. As opposed to LearningRateScheduler, it reduces the learning based on the metric (not epoch).
Several parameters resemble those of the EarlyStoppingCallback, so let’s highlight the ones that differ.
monitor, patience, verbose, mode, min_delta: These are similar to EarlyStopping. factor this is the factor by which the learning rate should be decreased (new learning rate = old learning rate * factor). cooldown: The number of epochs to wait before restarting the metrics monitoring. min_lr: The minimum bound for the learning rate (the learning rate can’t go below this). This callback is also called at the on_epoch_end event.
This callback is useful when posting logs to an API and can be mimicked using LambdaCallback.
For example:
Callback
To see if the callback is working, you need an endpoint hosted on localhost:8000. You can use Node.js. Save the code in the file server.js:
Then start the server by typing node server.js
(you should have node installed). At the end of the epoch you will see the log in the node console. If the server is not running then you will receive a warning at the end of the epoch.
This callback is also called at the on_epoch_end
event.
These two callbacks are automatically applied to all Keras models. The history object is returned by model.fit, and contains a dictionary with the average accuracy and loss over the epochs. The parameters property contains the dictionary with the parameters used for training (epochs, steps, verbose). If you have a callback for changing the learning rate, that will also be part of the history object.
Output of model_history.history
BaseLogger accumulates an average of your metrics across epochs. So, the metrics you see at the end of the epoch are an average of all the metrics over all the batches.
This callback terminates the training if the loss becomes NaN
.
You can choose callbacks based on your specific needs, and in many cases, combining multiple callbacks enhances the training efficiency. For example, TensorBoard helps monitor training progress visually, while EarlyStopping and LearningRateScheduler prevent overfitting by stopping training early or adjusting the learning rate dynamically. Additionally, ModelCheckpoint ensures that model checkpoints are saved periodically, preventing data loss. If you’re running deep learning workloads on the cloud, DigitalOcean’s GPU Droplets provide a powerful and cost-effective environment for training models efficiently. Furthermore, you can use the DigitalOcean’s 1-Click AI Models to streamline your workflow and integrate TensorFlow callbacks seamlessly into your training process.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Brillent