Yaoi Dirty Yaoi Dirty - 29 days ago 53
Python Question

TensorFlow: shuffle_batch did not show any error but did not finish

I am trying to use

shuffle.batch
to batch the data for training that loaded from .csv file. However, when I am running the code, it seems does not work. It did not show any error, but did not finish.

So, could you suggest to me what is wrong with my code?

Moreover, what is a suitable value for capacity and
min_after_dequeue
?

import tensorflow as tf
import numpy as np


test_label = []
in_label = []

iris_TRAINING = "iris_training.csv"
iris_TEST = "iris_test.csv"

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=iris_TRAINING, target_dtype=np.int, features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=iris_TEST, target_dtype=np.int, features_dtype=np.float32)

x_train, x_test, y_train, y_test = training_set.data, test_set.data, training_set.target, test_set.target



for n in y_train:
targets = np.zeros(3)
targets[int(n)] = 1 # one-hot pixs[0] is label and then use that number as index of one-hot
in_label.append(targets) #store all of label (one-hot)
training_label = np.asarray(in_label)

for i in y_test:
test_targets = np.zeros(3)
test_targets[int(i)] = 1 # one-hot pixs[0] is label and then use that number as index of one-hot
test_label.append(test_targets)
test_label = np.asarray(test_label)


x = tf.placeholder(tf.float32, [None,4]) #generate placeholder to store value of features for training

W = tf.Variable(tf.zeros([4, 3])) #weight
b = tf.Variable(tf.zeros([3])) #bias

y = tf.matmul(x, W) + b

y_ = tf.placeholder(tf.float32, [None, 3]) #generate placeholder to store value of labels


cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)


sess = tf.InteractiveSession()
# Train
tf.initialize_all_variables().run()

for i in range(5):
batch_xt, batch_yt = tf.train.shuffle_batch([x_train,training_label],batch_size=10,capacity=200,min_after_dequeue=10)
sess.run(train_step, feed_dict={x: batch_xt.eval(), y_: batch_yt.eval()})
print(i)

# Test trained model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


print(sess.run(accuracy, feed_dict={x: x_test, y_: test_label}))

Answer

Shuffle_batch build :

  1. a queue Q into which batch of your dataset will be enqueue
  2. an operation to dequeue Q and get a batch
  3. a QueueRunner to enqueue Q

(see here for more details)

So you don't need to call Shuffle_batch at each iteration but only one time before your loop. And you have to call tf.train.start_queue_runners() after. So the end of your code should be something like :

sess = tf.InteractiveSession()
# Train
tf.initialize_all_variables().run()
batch_xt, batch_yt = tf.train.shuffle_batch([x_train,training_label],batch_size=10,capacity=200,min_after_dequeue=10)
tf.train.start_queue_runners()

for i in range(5):
    sess.run(train_step, feed_dict={x: batch_xt.eval(), y_: batch_yt.eval()})  
    print(i)

# Test trained model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


print(sess.run(accuracy, feed_dict={x: x_test, y_: test_label}))

Suitable values for capacity and min_after_dequeue depend of your available memory and I/O throughput. Capacity limits the place taken in memory of your dataset. They just could impact the computation time but not the final result (See here for more details).

Comments