splinter splinter - 19 days ago 7
Python Question

Creating a new column consisting of lists in a DataFrame using pandas

Given the following

DataFrame
:

t
0 3
1 5


I would like to create a new column where wach entry is a list which is a function of the row it is in. In particular it should have a list with all positive integers which not greater than the entry in column
t
. So the output should be:

t newCol
0 3 [1,2,3]
1 5 [1,2,3,4,5]


In other words, I want to apply
list(range(1,t+1))
to each row. I know how to do it in a loop, but have a long
DataFrame
, so I am looking for speed. Thank you.

Answer

Here's a vectorized approach using NumPy methods -

a = df.t.values
idx = a.cumsum()
id_arr = np.ones(idx[-1],dtype=int)
id_arr[idx[:-1]] = -a[:-1]+1
df['newCol'] = np.split(id_arr.cumsum(),idx[:-1])

Sample run -

In [76]: df
Out[76]: 
   t                 newCol
0  4           [1, 2, 3, 4]
1  3              [1, 2, 3]
2  7  [1, 2, 3, 4, 5, 6, 7]
3  2                 [1, 2]
4  5        [1, 2, 3, 4, 5]
5  3              [1, 2, 3]
Comments