Vineet Kaushik - 7 months ago 71

Python Question

am trying to use a deep neural network architecture to classify against a binary label value - 0 and +1. Here is my code to do it in tensorflow. Also this question carries forward from the discussion in a previous question

`import tensorflow as tf`

import numpy as np

from preprocess import create_feature_sets_and_labels

train_x,train_y,test_x,test_y = create_feature_sets_and_labels()

x = tf.placeholder('float', [None, 5])

y = tf.placeholder('float')

n_nodes_hl1 = 500

n_nodes_hl2 = 500

# n_nodes_hl3 = 500

n_classes = 1

batch_size = 100

def neural_network_model(data):

hidden_1_layer = {'weights':tf.Variable(tf.random_normal([5, n_nodes_hl1])),

'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}

hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),

'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}

# hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),

# 'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}

# output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),

# 'biases':tf.Variable(tf.random_normal([n_classes]))}

output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_classes])),

'biases':tf.Variable(tf.random_normal([n_classes]))}

l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])

l1 = tf.nn.relu(l1)

l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])

l2 = tf.nn.relu(l2)

# l3 = tf.add(tf.matmul(l2, hidden_3_layer['weights']), hidden_3_layer['biases'])

# l3 = tf.nn.relu(l3)

# output = tf.transpose(tf.add(tf.matmul(l3, output_layer['weights']), output_layer['biases']))

output = tf.add(tf.matmul(l2, output_layer['weights']), output_layer['biases'])

return output

def train_neural_network(x):

prediction = tf.sigmoid(neural_network_model(x))

cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(prediction, y))

optimizer = tf.train.AdamOptimizer().minimize(cost)

hm_epochs = 10

with tf.Session() as sess:

sess.run(tf.initialize_all_variables())

for epoch in range(hm_epochs):

epoch_loss = 0

i = 0

while i < len(train_x):

start = i

end = i + batch_size

batch_x = np.array(train_x[start:end])

batch_y = np.array(train_y[start:end])

_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,

y: batch_y})

epoch_loss += c

i+=batch_size

print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)

# correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))

# accuracy = tf.reduce_mean(tf.cast(correct, 'float'))

predicted_class = tf.greater(prediction,0.5)

correct = tf.equal(predicted_class, tf.equal(y,1.0))

accuracy = tf.reduce_mean( tf.cast(correct, 'float') )

# print (test_x.shape)

# accuracy = tf.nn.l2_loss(prediction-y,name="squared_error_test_cost")/test_x.shape[0]

print('Accuracy:', accuracy.eval({x: test_x, y: test_y}))

train_neural_network(x)

Specifically, (carrying over the discussion from the previous question) I removed one layer -

`hidden_3_layer`

prediction = neural_network_model(x)

to

`prediction = tf.sigmoid(neural_network_model(x))`

and added the

`predicted_class, correct, accuracy`

This is my trace:

`('Epoch', 0, 'completed out of', 10, 'loss:', 37.312037646770477)`

('Epoch', 1, 'completed out of', 10, 'loss:', 37.073578298091888)

('Epoch', 2, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 3, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 4, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 5, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 6, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 7, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 8, 'completed out of', 10, 'loss:', 37.035196363925934)

('Epoch', 9, 'completed out of', 10, 'loss:', 37.035196363925934)

('Accuracy:', 0.42608696)

As you can see, the loss doesn't decrease. Hence I don't know if it is still working correctly.

Here are results from multiple re-runs. Results are swaying wildly:

`('Epoch', 0, 'completed out of', 10, 'loss:', 26.513012945652008)`

('Epoch', 1, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 2, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 3, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 4, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 5, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 6, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 7, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 8, 'completed out of', 10, 'loss:', 26.513012945652008)

('Epoch', 9, 'completed out of', 10, 'loss:', 26.513012945652008)

('Accuracy:', 0.60124224)

another:

`('Epoch', 0, 'completed out of', 10, 'loss:', 22.873702049255371)`

('Epoch', 1, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 2, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 3, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 4, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 5, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 6, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 7, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 8, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 9, 'completed out of', 10, 'loss:', 22.873702049255371)

('Accuracy:', 1.0)

and another:

`('Epoch', 0, 'completed out of', 10, 'loss:', 23.163824260234833)`

('Epoch', 1, 'completed out of', 10, 'loss:', 22.88000351190567)

('Epoch', 2, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 3, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 4, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 5, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 6, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 7, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 8, 'completed out of', 10, 'loss:', 22.873702049255371)

('Epoch', 9, 'completed out of', 10, 'loss:', 22.873702049255371)

('Accuracy:', 0.99627328)

I have also seen accuracy value of 0.0 -_-

Some details about data and data processing. I am using daily stock data for IBM from Yahoo! finance for a 20 year(almost) period. This amounts to roughly 5200 lines of entries.

Here is how I am processing it:

`import numpy as np`

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

import csv

import pickle

def create_feature_sets_and_labels(test_size = 0.2):

df = pd.read_csv("ibm.csv")

df = df.iloc[::-1]

features = df.values

testing_size = int(test_size*len(features))

train_x = list(features[1:,1:6][:-testing_size])

train_y = list(features[1:,7][:-testing_size])

test_x = list(features[1:,1:6][-testing_size:])

test_y = list(features[1:,7][-testing_size:])

scaler = MinMaxScaler(feature_range=(-5,5))

train_x = scaler.fit_transform(train_x)

train_y = scaler.fit_transform(train_y)

test_x = scaler.fit_transform(test_x)

test_y = scaler.fit_transform(test_y)

return train_x, train_y, test_x, test_y

if __name__ == "__main__":

train_x, train_y, test_x, test_y = create_feature_sets_and_labels()

with open('stockdata.pickle', 'wb') as f:

pickle.dump([train_x, train_y, test_x, test_y], f)

column 0 is date. So that is not used as a feature. Nor is column 7. I normalized the data using

`sklearn`

`MinMaxScaler()`

I've noticed that the system doesn't change its accuracy when data is presented in non-normalized form.

Answer

Once you pre-process your data into the wrong shape or range in a ML training task, the rest of the data flow will go wrong. You do this multiple times in different ways in the code in the question.

Taking things in order that the processing occurs. The first problems are with pre-processing. Your goals here should be:

X values (input features) in tabular form, each row is an example, each column is a feature. Values should be numeric and scaled for use with neural network. Test and train data need to be scaled identically - that doesn't mean using same

`.fit_transform`

because that re-fits the scaler.Y values (output labels) in tabular form, each row is example matching the same row of X, each column is the true value of an output. For classification problems the values are typically 0 and 1,

*and should not be re-scaled*since they represent class membership.

This re-write of your `create_feature_sets_and_labels`

function does things correctly:

```
def create_feature_sets_and_labels(test_size = 0.2):
df = pd.read_csv("ibm.csv")
df = df.iloc[::-1]
features = df.values
testing_size = int(test_size*len(features))
train_x = np.array(features[1:,1:6][:-testing_size]).astype(np.float32)
train_y = np.array(features[1:,7][:-testing_size]).reshape(-1, 1).astype(np.float32)
test_x = np.array(features[1:,1:6][-testing_size:]).astype(np.float32)
test_y = np.array(features[1:,7][-testing_size:]).reshape(-1, 1).astype(np.float32)
scaler = MinMaxScaler(feature_range=(-5,5))
scaler.fit(train_x)
train_x = scaler.transform(train_x)
test_x = scaler.transform(test_x)
return train_x, train_y, test_x, test_y
```

Important differences from your version:

Using typecast

`np.array`

, not`list`

(minor difference)y values are tabular

`[n_examples, n_outputs]`

(major difference, your row vector shape is cause of many problems later)Scaler is fit once then applied to features (major difference, if you scale train and test data separately, you are not predicting anything meaningful)

Scaler is

*not*applied to outputs (major difference for classifier, you want the train and test values to be 0,1 for meaningful training and reporting accuracy)

There are also some problems with your training code for this data:

`y = tf.placeholder('float')`

should be`y = tf.placeholder('float', [None, 1])`

. This makes no difference to processing, but correctly throws an error when`y`

is the wrong shape. That error would have been a clue much earlier that things were going wrong.`n_nodes_hl1 = 500`

and`n_nodes_hl2 = 500`

can be much lower, and the network will actually work much better with e.g.`n_nodes_hl1 = 10`

and`n_nodes_hl2 = 10`

- this is mainly because of you using large initial values for weights, you could alternatively scale the weights down, and for more complex data you might want to do that instead. In this case it is simpler to reduce number of hidden neurons.As we discussed in comments, the start of your train_neural_network function should look like this:

`output = neural_network_model(x) prediction = tf.sigmoid(output) cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(output, y)) optimizer = tf.train.AdamOptimizer().minimize(cost)`

. . . this is a major difference. By using

`sigmoid_cross_entropy_with_logits`

you have committed to using the pre-transform value of the output layer for training. But you still want the predicted values to measure accuracy (or for any other use of the network where you want to read off a predicted value).For consistent measure of loss, you want to have mean loss per example, so you need to divide you sum of mean-per-batch by the number of batches:

`'loss:', epoch_loss/(len(train_x)/batch_size)`

If I make all those corrections, and run this with a few more epochs - e.g. 50, then I get a typical loss of `0.7`

and accuracy measure of `0.5`

- and this occurs reasonably reliably, but does move a little due to changes in starting weights. The accuracy is not very stable, and possibly suffers from over-fit, which you are not allowing for at all (and you should read up on techniques to help measure and manage over-fit, it is an important part of training NNs reliably)

The value of `0.5`

may seem bad. It is possible to improve upon it, by modifying network architecture or meta-params. I can get down to `0.43`

training loss and up to `0.83`

test accuracy for example by swapping `tf.nn.relu`

for `tf.tanh`

in the hidden layers and running for 500 epochs.

To understand more about neural networks, what to measure when training and what might be worth changing in your model, you will want to study the subject in more depth.

Source (Stackoverflow)