Solving Training Instability of Keras or Tensorflow Artificial Neural Networks
Updated: Feb 3
Are you running into issues of instability when training your artificial neural network in Keras or Tensorflow?
The following may occur that indicates training instability :
Random behavior at the start of training : sometimes the loss does not improve at the start of training
Random behavior after each epoch : sometimes the loss jumps up after each epoch
Exploding gradient : sometimes the loss jumps to infinity and becomes NaN
The following explains the methods that I find effective in resolving these instability issues. I am creating my neural network models in Jupyter Notebook, and running Keras version 2.3.1 and Tensorflow version 2.0.0.
Reduce the Learning Rate, Increase the Epochs
In most cases, the Keras default parameters should not be modified, because these represent the recommended settings. However, I've noticed that changing the learning rate of the optimizer can lead to better stability.
I use the Adam optimizer, since this is the optimizer that has been proven to be the most effective in solving machine learning problems. The Adam optimizer has a "learning rate" parameter, which sets the initial learning rate of the optimizer.
In Keras, the default setting for learning rate is 1e-3. This setting can be reduced to 1e-4. Because the initial learning rate has been reduced, the model takes longer to train. In my experiments, I have to increase the number of epochs by a factor of 4 for the loss and accuracy to stabilize.
At the start of my notebook, I set the global parameters such as learning rate and epochs :
LR = 1e-4 EPOCHS = 2**7
The Tensorflow library also has to be included :
import tensorflow as tf
Before compiling the model, set the learning rate of the Adam optimizer :
opt = tf.keras.optimizers.Adam(lr=LR) model.compile(optimizer=opt ...) model.fit(epochs=EPOCHS ...)
I recommend that the learning rate setting should not be set below 1e-4. If the model is still unstable, then there are other problems that are contributing to the instability.
Reduce the Step Size for Recurrent Neural Networks
A recurrent neural network is made up of LSTM cells. A LSTM cell stores the state of the node after each time step. If there are too many steps, the weights accumulate too quickly, and the loss explodes to infinity.
LSTM cells accept a 3D array of [samples, steps, features]. Reducing the step size involves decreasing the [steps] dimension, and as a result, increasing the [samples] dimension.
Reducing the step size may result in lower accuracy, but will lead to better stability.
Thank you for reading. I hope you find this guide helpful for solving issues of instability when training your artificial neural network.
Questions or comments? You can reach me at firstname.lastname@example.org
Wayne Cheng is an A.I., machine learning, and deep learning developer at Audoir, LLC. His research involves the use of artificial neural networks to create music. Prior to starting Audoir, LLC, he worked as an engineer in various Silicon Valley startups. He has an M.S.E.E. degree from UC Davis, and a Music Technology degree from Foothill College.