piRSquared - 9 months ago 52

Python Question

consider the list of lists

`l`

`l = [[1, 2, 3], [1, 2]]`

if I convert this to a

`np.array`

`[1, 2, 3]`

`[1, 2]`

`print(np.array(l))`

[[1, 2, 3] [1, 2]]

I want this instead

`print(np.array([[1, 2, 3], [1, 2, np.nan]]))`

[[ 1. 2. 3.]

[ 1. 2. nan]]

I can do this with a loop, but we all know how unpopular loops are

`def box_pir(l):`

lengths = [i for i in map(len, l)]

shape = (len(l), max(lengths))

a = np.full(shape, np.nan)

for i, r in enumerate(l):

a[i, :lengths[i]] = r

return a

print(box_pir(l))

[[ 1. 2. 3.]

[ 1. 2. nan]]

`%%cython`

import numpy as np

def box_pir_cython(l):

lengths = [len(item) for item in l]

shape = (len(l), max(lengths))

a = np.full(shape, np.nan)

for i, r in enumerate(l):

a[i, :lengths[i]] = r

return a

`def box_divikar(v):`

lens = np.array([len(item) for item in v])

mask = lens[:,None] > np.arange(lens.max())

out = np.full(mask.shape, np.nan)

out[mask] = np.concatenate(v)

return out

def box_hpaulj(LoL):

return np.array(list(zip_longest(*LoL, fillvalue=np.nan))).T

def box_simon(LoL):

max_len = len(max(LoL, key=len))

return np.array([x + [np.nan]*(max_len-len(x)) for x in LoL])

def box_dawg(LoL):

cols=len(max(LoL, key=len))

rows=len(LoL)

AoA=np.empty((rows,cols, ))

AoA.fill(np.nan)

for idx in range(rows):

AoA[idx,0:len(LoL[idx])]=LoL[idx]

return AoA

def box_pir(l):

lengths = [len(item) for item in l]

shape = (len(l), max(lengths))

a = np.full(shape, np.nan)

for i, r in enumerate(l):

a[i, :lengths[i]] = r

return a

def box_pandas(l):

return pd.DataFrame(l).values

Answer Source

This seems to be a close one of `this question`

, where the padding was with `zeros`

instead of `NaNs`

. Interesting approaches were posted there, along with `mine`

based on `broadcasting`

and `boolean-indexing`

. So, I would just modify one line from my post there to solve this case like so -

```
def boolean_indexing(v):
lens = np.array([len(item) for item in v])
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape,np.nan)
out[mask] = np.concatenate(v)
return out
```

Sample run -

```
In [17]: l
Out[17]: [[1, 2, 3], [1, 2], [3, 8, 9, 7, 3]]
In [18]: boolean_indexing(l)
Out[18]:
array([[ 1., 2., 3., nan, nan],
[ 1., 2., nan, nan, nan],
[ 3., 8., 9., 7., 3.]])
```

I have posted few runtime results there for all the posted approaches on that Q&A, which could be useful.