ith140 ith140 - 10 months ago 67
Python Question

2D numpy array search (equivalent toMatlab's intersect 'rows' option)

I have two 4 column numpy arrays (2D) with several hundred (float) rows (cap and usp) in each. Considering a subset of 3 columns in each array (e.g.


  1. There are many common rows between both arrays.

  2. Each row tuple/"triplet" is unique in each array.

I am looking for an an efficient means to identify these common three value (row) subsets across both arrays while somehow retaining the 4th column from both arrays for further processing. In essence I'm looking for a great numpy way to do the equivalent of Matlab's intersect function with row option(i.e.
([c, ia, ib]=intersect(capind, uspind, 'rows');)

Which returns the index of the matching rows, so that it's now trivial to get the matching triplets and value from the 4th column from the original array (

My current approach is based upon a similar question on the forum as I cannot find a good match for my problem. However, this approach seems a little inefficient considering my goal (I also haven't fully solved my problem):

The arrays are something like this:

cap=array([[ 2.50000000e+01, 1.27000000e+02, 1.00000000e+00,
[ 2.60000000e+01, 1.27000000e+02, 1.00000000e+00,
[ 2.70000000e+01, 1.27000000e+02, 1.00000000e+00,
[ 6.10000000e+01, 1.80000000e+02, 1.06000000e+02,
[ 6.20000000e+01, 1.80000000e+02, 1.06000000e+02,
[ 6.30000000e+01, 1.80000000e+02, 1.06000000e+02,

usp=array([[ 4.10000000e+01, 1.31000000e+02, 1.00000000e+00,
[ 4.20000000e+01, 1.31000000e+02, 1.00000000e+00,
[ 4.30000000e+01, 1.31000000e+02, 1.00000000e+00,
[ 4.70000000e+01, 1.80000000e+02, 1.06000000e+02,
[ 4.80000000e+01, 1.80000000e+02, 1.06000000e+02,
[ 4.90000000e+01, 1.80000000e+02, 1.06000000e+02,

I then convert each 4 column array (usp and cap) into a three column array (capind and uspind shown below as integers for ease of viewing).

capind=array([[ 25, 127, 1],
[ 26, 127, 1],
[ 27, 127, 1],
[ 61, 180, 106],
[ 62, 180, 106],
[ 63, 180, 106]])
uspind=array([[ 41, 131, 1],
[ 42, 131, 1],
[ 43, 131, 1],
[ 47, 180, 106],
[ 48, 180, 106],
[ 49, 180, 106]])

Using a set operation gives me the matching triplets:
carray=np.array([x for x in set(tuple(x) for x in capind) & set(tuple(x) for x in uspind)])

This seems to work fairly well for finding the common row values from both uspind and capind arrays. I now need to get the 4th columns value from the matching rows (i.e. compare carray with the first three columns of the original source array (cap and usp) and somehow grab the value from the 4th column).

Is there a better a more efficient way to achieve this? Otherwise, any help on the best means to retrieve the 4th column values from the source arrays would be greatly appreciated.

Answer Source

Try using dictionaries.

capind = {tuple(row[:3]):row[3] for row in cap}
uspind = {tuple(row[:3]):row[3] for row in usp}

keys = capind.viewkeys() & uspind.viewkeys()
for key in keys:
    # capind[key] and uspind[key] are the fourth columns