Mert Ovn Mert Ovn - 3 months ago 23
Python Question

Generating 2d numpy arrays from random columns

I need to generate an 3xn matrix having random columns ensuring that each column does not contain the same number more than once. I am currently using the below code:

n=10
set = np.arange(0, 10)
matrix = np.random.choice(set, size=3, replace=False)[:, None]
for i in range(n):
column = np.random.choice(set, size=3, replace=False)[:, None]
matrix = np.concatenate((matrix, column),axis=1)
print matrix


which gives the output I expected:

[[2 1 7 2 1 9 7 4 5 2 7]
[4 6 3 5 9 8 1 3 8 4 0]
[3 5 0 0 4 5 4 0 2 5 3]]


However, it seems that the code does not work fast enough. I am aware that implementing the for loop using cython might help, but I want to know that is there any more performant way to write this code solely in python.

Answer

As was already mentioned in the comments, concatenating repeatedly to a numpy array is a bad idea, as you will have to reallocate memory a lot. As you already know the final size of your result array, you could simply allocate it in the begin and then just iterate over the columns:

matrix = np.empty((3, n), dtype=np.int)
for i in range(n):
    matrix[:, i] = np.random.choice(10, size=3, replace=False)

At least on my machine, this is already 6 times faster, than your version.