I have a dataframe
df = pd.DataFrame({'age' : [(1, 2), (1, 3), (1, 1)], \
'year' : [(20, 30), (30, 40), (30, 40)]})
df
Out[58]:
age year
0 (1, 2) (20, 30)
1 (1, 3) (30, 40)
2 (1, 1) (30, 40)
array([[ 1, 2, 20, 30],
[ 1, 3, 30, 40],
[ 1, 1, 30, 40]])
col_names
col_names = ['age', 'year']
numpy_array = some_clever_expression(col_names)
Stack with np.concatenate
to get a 1D flattened array and then reshape -
np.concatenate(np.concatenate(df.values)).reshape(df.shape[0],-1)
Sample output -
In [460]: np.concatenate(np.concatenate(df.values)).reshape(df.shape[0],-1)
Out[460]:
array([[ 1, 2, 20, 30],
[ 1, 3, 30, 40],
[ 1, 1, 30, 40]])
Alternatively, we could use np.hstack
to get the flattened version -
np.hstack(np.hstack(df.values))
To select specific columns, simple index into those columns, get the array data and proceed. Thus, for a list of column names in col_names
, use df[col_names].values
instead.