Mr. Frobenius - 3 months ago 19

Python Question

My problem is as follows. I have a (large) 3D data set of points in real physical space (x,y,z). It has been generated by a nested for loop that looks like this:

`# Generate given dat with its ordering`

x_samples = 2

y_samples = 3

z_samples = 4

given_dat = np.zeros(((x_samples*y_samples*z_samples),3))

row_ind = 0

for z in range(z_samples):

for y in range(y_samples):

for x in range(x_samples):

row = [x+.1,y+.2,z+.3]

given_dat[row_ind,:] = row

row_ind += 1

for row in given_dat:

print(row)`

For the sake of comparing it to another set of data, I want to reorder the given data into my desired order as follows (unorthodox, I know):

`# Generate data with desired ordering`

x_samples = 2

y_samples = 3

z_samples = 4

desired_dat = np.zeros(((x_samples*y_samples*z_samples),3))

row_ind = 0

for z in range(z_samples):

for x in range(x_samples):

for y in range(y_samples):

row = [x+.1,y+.2,z+.3]

desired_dat[row_ind,:] = row

row_ind += 1

for row in desired_dat:

print(row)

I have written a function that does what I want, but it is horribly slow and inefficient:

`def bad_method(x_samp,y_samp,z_samp,data):`

zs = np.unique(data[:,2])

xs = np.unique(data[:,0])

rowlist = []

for z in zs:

for x in xs:

for row in data:

if row[0] == x and row[2] == z:

rowlist.append(row)

new_data = np.vstack(rowlist)

return new_data

# Shows that my function does with I want

fix = bad_method(x_samples,y_samples,z_samples,given_dat)

print('Unreversed data')

print(given_dat)

print('Reversed Data')

print(fix)

# If it didn't work this will throw an exception

assert(np.array_equal(desired_dat,fix))

How could I improve my function so it is faster? My data sets usually have roughly 2 million rows. It must be possible to do this with some clever slicing/indexing which I'm sure will be faster but I'm having a hard time figuring out how. Thanks for any help!

Answer

You could reshape your array, swap the axes as necessary and reshape back again:

```
# (No need to copy if you don't want to keep the given_dat ordering)
data = np.copy(given_dat).reshape(( z_samples, y_samples, x_samples, 3))
# swap the "y" and "x" axes
data = np.swapaxes(data, 1,2)
# back to 2-D array
data = data.reshape((x_samples*y_samples*z_samples,3))
assert(np.array_equal(desired_dat,data))
```