O.rka O.rka - 6 months ago 364
Python Question

Use attribute and target matrices for TensorFlow Linear Regression Python

I'm trying to follow this tutorial.

TensorFlow just came out and I'm really trying to understand it. I'm familiar with penalized linear regression like Lasso, Ridge, and ElasticNet and its usage in

scikit-learn
.

For
scikit-learn
Lasso regression, all I need to input into the regression algorithm is
DF_X
[an M x N dimensional attribute matrix (pd.DataFrame)] and
SR_y
[an M dimensional target vector (pd.Series)]. The
Variable
structure in TensorFlow is a bit new to me and I'm not sure how to structure my input data into what it wants.

It seems as if softmax regression is for classification. How can I restructure my
DF_X
(M x N attribute matrix) and
SR_y
(M dimensional target vector) to input into
tensorflow
for linear regression?


My current method for doing a Linear Regression uses pandas, numpy, and sklearn and it's shown below. I think this question will be really helpful for people getting familiar with TensorFlow:

#!/usr/bin/python
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.linear_model import LassoCV

#Create DataFrames for attribute and target matrices
DF_X = pd.DataFrame(np.array([[0,0,1],[2,3,1],[4,5,1],[3,4,1]]),columns=["att1","att2","att3"],index=["s1","s2","s3","s4"])
SR_y = pd.Series(np.array([3,2,5,8]),index=["s1","s2","s3","s4"],name="target")

print DF_X
#att1 att2 att3
#s1 0 0 1
#s2 2 3 1
#s3 4 5 1
#s4 3 4 1

print SR_y
#s1 3
#s2 2
#s3 5
#s4 8
#Name: target, dtype: int64

#Create Linear Model (Lasso Regression)
model = LassoCV()
model.fit(DF_X,SR_y)

print model
#LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
#max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
#precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
#verbose=False)

print model.coef_
#[ 0. 0.3833346 0. ]

Answer

Softmax is an only addition function (in logistic regression for example), it is not a model like

model = LassoCV()
model.fit(DF_X,SR_y)

Therefore you can't simply give it data with fit method. However, you can simply create your model with the help of TensorFlow functions.

First of all, you have to create a computational graph, for example for linear regression you will create tensors with the size of your data. They are only tensors and you will give them your array in another part of the program.

import tensorflow as tf
x = tf.placeholder("float", [4, 3])      
y_ = tf.placeholder("float",[4])

When you create two variables, that will contain initial weights of our model

W = tf.Variable(tf.zeros([3,1]))
b = tf.Variable(tf.zeros([1]))

And now you can create the model (you want to create regression, not classification therefore you don't need to use tf.nn.softmax )

y=tf.matmul(x,W) + b

As you have regression and linear model you will use

loss=tf.reduce_sum(tf.square(y_ - y))

Then we will train our model with the same step as in the tutorial

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

Now that you created the computational graph you have to write one more part of the program, where you will use this graph to work with your data.

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)       
sess.run(train_step, feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})

Here you give your data to this computational graph with the help of feed_dict. In TensorFlow you provide information in numpy arrays. If you want to see your mistake you can write

sess.run(loss,feed_dict={x:np.asarray(DF_X),y_:np.asarray(SR_y)})