piRSquared piRSquared - 7 months ago 17
Python Question

generate all quadratic combinations of any 2 columns

I have a

DataFrame
df
with columns
C1
,
C2
,
C3
,
C4
. I want a new
DataFrame
in which every combination of one column multiplied with the other is represented. This means in the case of 4 columns to start with, we should have
sum(4, 3, 2, 1) = 10
columns. Furthermore, the columns should be labeled as a
MultiIndex
where each level identifies one of the original columns being multiplied.

So if

df = pd.DataFrame(np.random.rand(2, 4) * 10, columns=['C1', 'C2', 'C3', 'C4']).astype(int)

print df

C1 C2 C3 C4
0 8 0 5 6
1 4 5 3 5


I expect
df_quad
to look like:

C1 C2 C3 C4
C1 C2 C3 C4 C2 C3 C4 C3 C4 C4
0 64 0 40 48 0 0 0 25 30 36
1 16 20 12 20 25 15 25 9 15 25

Answer

try this:

from itertools import combinations, combinations_with_replacement

data = """\
   C1  C2  C3  C4
0   8   0   5   6
1   4   5   3   5
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0)

combs = list(combinations_with_replacement(df.columns.tolist(), 2))

df_quad = pd.DataFrame()

for tup in combs:
   df_quad['{0[0]}_{0[1]}'.format(tup)] = df[tup[0]] * df[tup[1]]

Test:

In [77]: df_quad
Out[77]:
   C1_C1  C1_C2  C1_C3  C1_C4  C2_C2  C2_C3  C2_C4  C3_C3  C3_C4  C4_C4
0     64      0     40     48      0      0      0     25     30     36
1     16     20     12     20     25     15     25      9     15     25

In [78]: combs
Out[78]:
[('C1', 'C1'),
 ('C1', 'C2'),
 ('C1', 'C3'),
 ('C1', 'C4'),
 ('C2', 'C2'),
 ('C2', 'C3'),
 ('C2', 'C4'),
 ('C3', 'C3'),
 ('C3', 'C4'),
 ('C4', 'C4')]