hypernote - 6 months ago 252

Python Question

I have a dataset with 5K rows (-1K for validation) and 17 columns, including the last one (the target integer binary label).

My model is simply this 2-layer LSTM:

`model = Sequential()`

model.add(Embedding(output_dim=64, input_dim=17))

model.add(LSTM(32, return_sequences=True))

model.add(Dropout(0.5))

model.add(LSTM(32, return_sequences=False))

model.add(Dense(1))

model.compile(loss='binary_crossentropy', optimizer='rmsprop',

class_mode='binary')

After loading my dataset with pandas

`df_train = pd.read_csv(train_file)`

train_X, train_y = df_train.values[:, :-1], df_train['target'].values

and trying to run my model, I get this error:

Exception: When using TensorFlow, you should define explicitly the number of timesteps of your sequences. - If your first layer is an Embedding, make sure to pass it an "input_length" argument. Otherwise, make sure the first layer has an "input_shape" or "batch_input_shape" argument, including the time axis.

What should I put in

`input_length`

Since my dataframe has a shape as train_X=(4000, 17) train_y=(4000,) how can I prepare it to feed this kind of model? I have to change my input data shape?

Thanks for any help!! (=

Answer

It looks like Keras uses the static unrolling approach to build recurrent networks (such as LSTMs) on TensorFlow. The `input_length`

should be the length of the longest sequence that you want to train: so if each row of your CSV file `train_file`

is a comma-delimited sequence of symbols, it should be the number of symbols in the longest row.