user2909415 user2909415 - 1 month ago 8
Python Question

Numpy: Fix array with rows of different lengths by filling the empty elements with zeros

I think this one is pretty clear. The functionality I am looking for looks something like this:

Edit: Data is read in from disk as a list of lists.

data = np.array([[1, 2, 3, 4],
[2, 3, 1],
[5, 5, 5, 5],
[1, 1]])
result = fix(data)
print result

[[ 1. 2. 3. 4.]
[ 2. 3. 1. 0.]
[ 5. 5. 5. 5.]
[ 1. 1. 0. 0.]]


These data arrays I'm working with are really large so I would really appreciate the most efficient solution.

Answer

This could be one approach -

# Input object dtype array
data = np.array([[1, 2, 3, 4],
                 [2, 3, 1],
                 [5, 5, 5, 5],
                 [1, 1]])

# Get lengths of each row of data
lens = np.array([len(data[i]) for i in range(len(data))])

# Mask of valid places in each row
mask = np.arange(lens.size) < lens[:,None]

# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape)
out[mask] = np.hstack((data[:]))

Sample input, output -

In [84]: data
Out[84]: array([[1, 2, 3, 4], [2, 3, 1], [5, 5, 5, 5], [1, 1]], dtype=object)

In [85]: out
Out[85]: 
array([[ 1.,  2.,  3.,  4.],
       [ 2.,  3.,  1.,  0.],
       [ 5.,  5.,  5.,  5.],
       [ 1.,  1.,  0.,  0.]])
Comments