ngoduyvu ngoduyvu - 2 months ago 36
Python Question

Cost function for word2vec

I am currently doing text classification with pretrain by

word2vec
. But before feeding to
Convolution neural network
, I have to write cost function.

Here is my code:

W = tf.Variable(tf.constant(0.0, shape=[vocabulary_size, embedding_size]),
trainable=False, name="W")

embedding_placeholder = tf.placeholder(tf.float32, [vocabulary_size, embedding_size])
embedding_init = W.assign(embedding_placeholder)

sess = tf.Session()

sess.run(embedding_init, feed_dict={embedding_placeholder: final_embeddings})

embedded_chars = tf.nn.embedding_lookup(W, data)
embedded_chars_expanded = tf.expand_dims(embedded_chars, -1)


the code for
word2vec
is word2vec_basic.py.

When I feed to the convex function:

filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
embedding_init,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv")


It gave me an following error:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-29-9c12d490e7ab> in <module>()
11 strides=[1, 1, 1, 1],
12 padding="VALID",
---> 13 name="conv")
ValueError: Shape (50000, 128) must have rank 4


I suspect it is my tensor size is wrong but I am not really sure I to set it right.

Answer

The error you got is because the input vector to tf.nn.conv2d expects a tensor of shape:

[batch, in_height, in_width, in_channels]

and what you have here is with shape (50000, 128). You might want to use embedded_chars_expanded as the input.