Vincent - 1 year ago 263

Python Question

I have a

`pandas dataframe`

`a`

`b`

`In [7]: a`

Out[7]:

var1 var2

0 a,b,c 1

1 d,e,f 2

In [8]: b

Out[8]:

var1 var2

0 a 1

1 b 1

2 c 1

3 d 2

4 e 2

5 f 2

So far, I have tried various simple functions, but the

`.apply`

`.transform`

Example data:

`from pandas import DataFrame`

import numpy as np

a = DataFrame([{'var1': 'a,b,c', 'var2': 1},

{'var1': 'd,e,f', 'var2': 2}])

b = DataFrame([{'var1': 'a', 'var2': 1},

{'var1': 'b', 'var2': 1},

{'var1': 'c', 'var2': 1},

{'var1': 'd', 'var2': 2},

{'var1': 'e', 'var2': 2},

{'var1': 'f', 'var2': 2}])

I know this won't work because we lose DataFrame meta-data by going through numpy, but it should give you a sense of what I tried to do:

`def fun(row):`

letters = row['var1']

letters = letters.split(',')

out = np.array([row] * len(letters))

out['var1'] = letters

a['idx'] = range(a.shape[0])

z = a.groupby('idx')

z.transform(fun)

Answer Source

How about something like this:

```
In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
```

Then you just have to rename the columns