I would like to use Batch Normalization in TensorFlow, since I found it in the source code
BN has different semantics in MLP and CNN, so I am not sure what exactly this BN does.
I did not find a method called
t: A 4D input Tensor.
m: A 1D mean Tensor with size matching the last dimension of t.
This is the first output from MovingMoments.
v: A 1D variance Tensor with size matching the last dimension of t.
This is the second output from MovingMoments.
beta: A 1D beta Tensor with size matching the last dimension of t.
An offset to be added to the normalized tensor.
gamma: A 1D gamma Tensor with size matching the last dimension of t.
If "scale_after_normalization" is true, this tensor will be multiplied
with the normalized tensor.
variance_epsilon: A small float number to avoid dividing by 0.
scale_after_normalization: A bool indicating whether the resulted tensor
needs to be multiplied with gamma.
The documentation string for this has improved since the release - see the docs comment in the master branch instead of the one you found. It clarifies, in particular, that it's the output from
You can see a very simple example of its use in the batch_norm test code. For a more real-world use example, I've included below the helper class and use notes that I scribbled up for my own use (no warranty provided!):
"""A helper class for managing batch normalization state. This class is designed to simplify adding batch normalization (http://arxiv.org/pdf/1502.03167v3.pdf) to your model by managing the state variables associated with it. Important use note: The function get_assigner() returns an op that must be executed to save the updated state. A suggested way to do this is to make execution of the model optimizer force it, e.g., by: update_assignments = tf.group(bn1.get_assigner(), bn2.get_assigner()) with tf.control_dependencies([optimizer]): optimizer = tf.group(update_assignments) """ import tensorflow as tf class ConvolutionalBatchNormalizer(object): """Helper class that groups the normalization logic and variables. Use: ewma = tf.train.ExponentialMovingAverage(decay=0.99) bn = ConvolutionalBatchNormalizer(depth, 0.001, ewma, True) update_assignments = bn.get_assigner() x = bn.normalize(y, train=training?) (the output x will be batch-normalized). """ def __init__(self, depth, epsilon, ewma_trainer, scale_after_norm): self.mean = tf.Variable(tf.constant(0.0, shape=[depth]), trainable=False) self.variance = tf.Variable(tf.constant(1.0, shape=[depth]), trainable=False) self.beta = tf.Variable(tf.constant(0.0, shape=[depth])) self.gamma = tf.Variable(tf.constant(1.0, shape=[depth])) self.ewma_trainer = ewma_trainer self.epsilon = epsilon self.scale_after_norm = scale_after_norm def get_assigner(self): """Returns an EWMA apply op that must be invoked after optimization.""" return self.ewma_trainer.apply([self.mean, self.variance]) def normalize(self, x, train=True): """Returns a batch-normalized version of x.""" if train: mean, variance = tf.nn.moments(x, [0, 1, 2]) assign_mean = self.mean.assign(mean) assign_variance = self.variance.assign(variance) with tf.control_dependencies([assign_mean, assign_variance]): return tf.nn.batch_norm_with_global_normalization( x, mean, variance, self.beta, self.gamma, self.epsilon, self.scale_after_norm) else: mean = self.ewma_trainer.average(self.mean) variance = self.ewma_trainer.average(self.variance) local_beta = tf.identity(self.beta) local_gamma = tf.identity(self.gamma) return tf.nn.batch_norm_with_global_normalization( x, mean, variance, local_beta, local_gamma, self.epsilon, self.scale_after_norm)
Note that I called it a
ConvolutionalBatchNormalizer because it pins the use of
tf.nn.moments to sum across axes 0, 1, and 2, whereas for non-convolutional use you might only want axis 0.
Feedback appreciated if you use it.