craymichael craymichael - 3 months ago 60
Python Question

Cannot seem to get read_batch_examples working alongside an Estimator

EDIT: I'm using TensorFlow version 0.10.0rc0

I'm currently trying to use

tf.contrib.learn.read_batch_examples
working while using a TensorFlow (SKFlow/tf.contrib) Estimator, specifically the
LinearClassifier
. I create a
read_batch_examples
op feeding in a CSV file with a
tf.decode_csv
for the
parse_fn
parameter with appropriate default records. I then feed that op to my
input_fn
for fitting the Estimator, but when that's run I receive the following error:

ValueError: Tensor("centered_bias_weight:0", shape=(1,), dtype=float32_ref) must be from the same graph as Tensor("linear/linear/BiasAdd:0", shape=(?, 1), dtype=float32).


I'm confused because neither of those Tensors appear to be from the
read_batch_examples
op. The code works if I run the op beforehand and then feed the input instead as an array of values. While this workaround exists, it is unhelpful because I am working with large datasets in which I need to batch in my inputs. Currently going over
Estimator.fit
(currently equivalent to
Estimator.partial_fit
in iterations isn't nearly as fast as being able to feed in data as it trains, so having this working is ideal. Any ideas? I'll post the non-functioning code below.

def input_fn(examples_dict):
continuous_cols = {k: tf.cast(examples_dict[k], dtype=tf.float32)
for k in CONTINUOUS_FEATURES}
categorical_cols = {
k: tf.SparseTensor(
indices=[[i, 0] for i in xrange(examples_dict[k].get_shape()[0])],
values=examples_dict[k],
shape=[int(examples_dict[k].get_shape()[0]), 1])
for k in CATEGORICAL_FEATURES}
feature_cols = dict(continuous_cols)
feature_cols.update(categorical_cols)
label = tf.contrib.layers.one_hot_encoding(labels=examples_dict[LABEL],
num_classes=2,
on_value=1,
off_value=0)
return feature_cols, label

filenames = [...]
csv_headers = [...] # features and label headers
batch_size = 50
min_after_dequeue = int(num_examples * min_fraction_of_examples_in_queue)
queue_capacity = min_after_dequeue + 3 * batch_size
examples = tf.contrib.learn.read_batch_examples(
filenames,
batch_size=batch_size,
reader=tf.TextLineReader,
randomize_input=True,
queue_capacity=queue_capacity,
num_threads=1,
read_batch_size=1,
parse_fn=lambda x: tf.decode_csv(x, [tf.constant([''], dtype=tf.string) for _ in xrange(csv_headers)]))

examples_dict = {}
for i, header in enumerate(csv_headers):
examples_dict[header] = examples[:, i]

categorical_cols = []
for header in CATEGORICAL_FEATURES:
categorical_cols.append(tf.contrib.layers.sparse_column_with_keys(
header,
keys # Keys for that particular feature, source not shown here
))
continuous_cols = []
for header in CONTINUOUS_FEATURES:
continuous_cols.append(tf.contrib.layers.real_valued_column(header))
feature_columns = categorical_cols + continuous_cols

model = tf.contrib.learn.LinearClassifier(
model_dir=model_dir,
feature_columns=feature_columns,
optimizer=optimizer,
n_classes=num_classes)
# Above code is ok up to this point
model.fit(input_fn=lambda: input_fn(examples_dict),
steps=200) # This line causes the error ****


Any alternatives for batching would be appreciated as well!

Answer

I was able to figure out my mistake through the help of the great TensorFlow team! read_batch_examples has to be called within input_fn, otherwise the op has to be run beforehand as it'll be from a different graph. If someone else has this problem and I wasn't clear enough, just leave a comment.

Comments