piRSquared - 1 year ago 100
Python Question

# efficiently convert uneven list of lists to minimal containing array padded with nan

consider the list of lists

`l`

``````l = [[1, 2, 3], [1, 2]]
``````

if I convert this to a
`np.array`
I'll get a one dimensional object array with
`[1, 2, 3]`
in the first position and
`[1, 2]`
in the second position.

``````print(np.array(l))

[[1, 2, 3] [1, 2]]
``````

``````print(np.array([[1, 2, 3], [1, 2, np.nan]]))

[[  1.   2.   3.]
[  1.   2.  nan]]
``````

I can do this with a loop, but we all know how unpopular loops are

``````def box_pir(l):
lengths = [i for i in map(len, l)]
shape = (len(l), max(lengths))
a = np.full(shape, np.nan)
for i, r in enumerate(l):
a[i, :lengths[i]] = r
return a

print(box_pir(l))

[[  1.   2.   3.]
[  1.   2.  nan]]
``````

how do I do this in a fast, vectorized way?

timing

setup functions

``````%%cython
import numpy as np

def box_pir_cython(l):
lengths = [len(item) for item in l]
shape = (len(l), max(lengths))
a = np.full(shape, np.nan)
for i, r in enumerate(l):
a[i, :lengths[i]] = r
return a
``````

``````def box_divikar(v):
lens = np.array([len(item) for item in v])
return out

def box_hpaulj(LoL):
return np.array(list(zip_longest(*LoL, fillvalue=np.nan))).T

def box_simon(LoL):
max_len = len(max(LoL, key=len))
return np.array([x + [np.nan]*(max_len-len(x)) for x in LoL])

def box_dawg(LoL):
cols=len(max(LoL, key=len))
rows=len(LoL)
AoA=np.empty((rows,cols, ))
AoA.fill(np.nan)
for idx in range(rows):
AoA[idx,0:len(LoL[idx])]=LoL[idx]
return AoA

def box_pir(l):
lengths = [len(item) for item in l]
shape = (len(l), max(lengths))
a = np.full(shape, np.nan)
for i, r in enumerate(l):
a[i, :lengths[i]] = r
return a

def box_pandas(l):
return pd.DataFrame(l).values
``````

This seems to be a close one of `this question`, where the padding was with `zeros` instead of `NaNs`. Interesting approaches were posted there, along with `mine` based on `broadcasting` and `boolean-indexing`. So, I would just modify one line from my post there to solve this case like so -

``````def boolean_indexing(v):
lens = np.array([len(item) for item in v])
return out
``````

Sample run -

``````In [17]: l
Out[17]: [[1, 2, 3], [1, 2], [3, 8, 9, 7, 3]]

In [18]: boolean_indexing(l)
Out[18]:
array([[  1.,   2.,   3.,  nan,  nan],
[  1.,   2.,  nan,  nan,  nan],
[  3.,   8.,   9.,   7.,   3.]])
``````

I have posted few runtime results there for all the posted approaches on that Q&A, which could be useful.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download