Forgive my terminology, I'm not an ML pro. I might use the wrong terms below.
I'm trying to perform multivariable linear regression. Let's say I'm trying to work out user gender by analysing page views on a web site.
For each user whose gender I know, I have a feature matrix where each row represents a web site section, and the second element whether they visited it, e.g.:
male1 = [
[1, 1], # visited section 1
[2, 0], # didn't visit section 2
[3, 1], # visited section 3, etc
features = male1
gender = 1
xs = [
[ # user1
[ # user2
ys = [1, 0, ...]
from sklearn import linear_model
clf = linear_model.LinearRegression()
ValueError: Found array with dim 3. Estimator expected <= 2.
You need to create
xs in a different way. According to the docs:
fit(X, y, sample_weight=None)
X : numpy array or sparse matrix of shape [n_samples, n_features] Training data y : numpy array of shape [n_samples, n_targets] Target values sample_weight : numpy array of shape [n_samples] Individual weights for each sample
xs should be a 2D array with as many rows as users and as many columns as web site sections. Your
xs is currently a 3D array. In order to reduce the number of dimensions by one you could get rid of the section numbers through a list comprehension:
xs = [[visit for section, visit in user] for user in xs]
If you do so, the data you provided as an example gets transformed into:
xs = [[1, 0, 1, 0], # user1 [0, 1, 1, 0], # user2 ... ]
clf.fit(xs, ys) should work as expected.
A more efficient approach to dimension reduction would be that of slicing a NumPy array:
import numpy as np xs = np.asarray(xs)[:,:,1]