Shawn Lee - 8 months ago 1792

Python Question

I would like to use Batch Normalization in TensorFlow, since I found it in the source code

. However, I did not find it documented on tensorflow.org.BN has different semantics in MLP and CNN, so I am not sure what exactly this BN does.

I

`MovingMoments`

The C++ code is copied here for reference:

`REGISTER_OP("BatchNormWithGlobalNormalization")`

.Input("t: T")

.Input("m: T")

.Input("v: T")

.Input("beta: T")

.Input("gamma: T")

.Output("result: T")

.Attr("T: numbertype")

.Attr("variance_epsilon: float")

.Attr("scale_after_normalization: bool")

.Doc(R"doc(

Batch normalization.

t: A 4D input Tensor.

m: A 1D mean Tensor with size matching the last dimension of t.

This is the first output from MovingMoments.

v: A 1D variance Tensor with size matching the last dimension of t.

This is the second output from MovingMoments.

beta: A 1D beta Tensor with size matching the last dimension of t.

An offset to be added to the normalized tensor.

gamma: A 1D gamma Tensor with size matching the last dimension of t.

If "scale_after_normalization" is true, this tensor will be multiplied

with the normalized tensor.

variance_epsilon: A small float number to avoid dividing by 0.

scale_after_normalization: A bool indicating whether the resulted tensor

needs to be multiplied with gamma.

)doc");

Answer

The documentation string for this has improved since the release - see the docs comment in the master branch instead of the one you found. It clarifies, in particular, that it's the output from `tf.nn.moments`

.

You can see a very simple example of its use in the batch_norm test code. For a more real-world use example, I've included below the helper class and use notes that I scribbled up for my own use (no warranty provided!):

```
"""A helper class for managing batch normalization state.
This class is designed to simplify adding batch normalization
(http://arxiv.org/pdf/1502.03167v3.pdf) to your model by
managing the state variables associated with it.
Important use note: The function get_assigner() returns
an op that must be executed to save the updated state.
A suggested way to do this is to make execution of the
model optimizer force it, e.g., by:
update_assignments = tf.group(bn1.get_assigner(),
bn2.get_assigner())
with tf.control_dependencies([optimizer]):
optimizer = tf.group(update_assignments)
"""
import tensorflow as tf
class ConvolutionalBatchNormalizer(object):
"""Helper class that groups the normalization logic and variables.
Use:
ewma = tf.train.ExponentialMovingAverage(decay=0.99)
bn = ConvolutionalBatchNormalizer(depth, 0.001, ewma, True)
update_assignments = bn.get_assigner()
x = bn.normalize(y, train=training?)
(the output x will be batch-normalized).
"""
def __init__(self, depth, epsilon, ewma_trainer, scale_after_norm):
self.mean = tf.Variable(tf.constant(0.0, shape=[depth]),
trainable=False)
self.variance = tf.Variable(tf.constant(1.0, shape=[depth]),
trainable=False)
self.beta = tf.Variable(tf.constant(0.0, shape=[depth]))
self.gamma = tf.Variable(tf.constant(1.0, shape=[depth]))
self.ewma_trainer = ewma_trainer
self.epsilon = epsilon
self.scale_after_norm = scale_after_norm
def get_assigner(self):
"""Returns an EWMA apply op that must be invoked after optimization."""
return self.ewma_trainer.apply([self.mean, self.variance])
def normalize(self, x, train=True):
"""Returns a batch-normalized version of x."""
if train:
mean, variance = tf.nn.moments(x, [0, 1, 2])
assign_mean = self.mean.assign(mean)
assign_variance = self.variance.assign(variance)
with tf.control_dependencies([assign_mean, assign_variance]):
return tf.nn.batch_norm_with_global_normalization(
x, mean, variance, self.beta, self.gamma,
self.epsilon, self.scale_after_norm)
else:
mean = self.ewma_trainer.average(self.mean)
variance = self.ewma_trainer.average(self.variance)
local_beta = tf.identity(self.beta)
local_gamma = tf.identity(self.gamma)
return tf.nn.batch_norm_with_global_normalization(
x, mean, variance, local_beta, local_gamma,
self.epsilon, self.scale_after_norm)
```

Note that I called it a `ConvolutionalBatchNormalizer`

because it pins the use of `tf.nn.moments`

to sum across axes 0, 1, and 2, whereas for non-convolutional use you might only want axis 0.

Feedback appreciated if you use it.