jackskis - 4 months ago 23

Python Question

**Background:**

I have a list of 44906 items:

`large = [1, 60, 17, ...]`

I need to find all the pair-wise combinations of

`large`

When I use

`itertools.combinations(large, 2)`

`n*(n-1)/2`

`n`

The number of combinations for

`n=44906`

`44906*44905/2 = 1008251965`

`i`

`i`

Let's say I have an array

`small = [1,2,3,4,5]`

In the configuration in which I have the code,

`itertools.combinations(small, 2)`

`[(1, 2), # 1st entry`

(1, 3), # 2nd entry

(1, 4), # 3rd entry

(1, 5), # 4th entry

(2, 3), # 5th entry

(2, 4), # 6th entry

(2, 5), # 7th entry

(3, 4), # 8th entry

(3, 5), # 9th entry

(4, 5)] # 10th entry

A call to a the function like this: `find_pair(10)' would return:

`(4, 5)`

, giving the 10th entry in the would-be array, but without calculating the entire combinatorial explosion beforehand.

The thing is, I need to be able to drop in to the middle of the combinations, not starting from the beginning every time, which is what it seems like an iterator does:

`>>> from itertools import combinations`

>>> it = combinations([1, 2, 3, 4, 5], 2)

>>> next(it)

(1, 2)

>>> next(it)

(1, 3)

>>> next(it)

(1, 4)

>>> next(it)

(1, 5)

So, instead of having to execute next() 10 times to get to the 10th combination, I would like to be able to retrieve the tuple returned by the 10th iteration with one call.

Are there any other combinatorial functions that behave this way designed to deal with huge data sets? If not, is there a good way to implement a memory-saving algorithm that behaves this way?

Answer

If you want random access to any combination you can use this function to return the index of a corresponding lower triangular representation of the cross-product

```
def comb(k):
row=int((math.sqrt(1+8*k)+1)/2)
column=int(k-(row-1)*(row)/2)
return [row,column]
```

using your small array for example

```
small = [1,2,3,4,5]
length = len(small)
size = int(length * (length-1)/2)
for i in range(size):
[n,m] = comb(i)
print(i,[n,m],"(",small[n],",",small[m],")")
```

will give

```
0 [1, 0] ( 2 , 1 )
1 [2, 0] ( 3 , 1 )
2 [2, 1] ( 3 , 2 )
3 [3, 0] ( 4 , 1 )
4 [3, 1] ( 4 , 2 )
5 [3, 2] ( 4 , 3 )
6 [4, 0] ( 5 , 1 )
7 [4, 1] ( 5 , 2 )
8 [4, 2] ( 5 , 3 )
9 [4, 3] ( 5 , 4 )
```

obviously if your access method is in order other methods will be more practical.

Note also that the `comb`

function is independent of the size of the problem.

As suggested by @Blckknght in the comments to get the same order as itertools version change to

```
for i in range(size):
[n,m] = comb(size-1-i)
print(i,[n,m],"(",small[length-1-n],",",small[length-1-m],")")
0 [4, 3] ( 1 , 2 )
1 [4, 2] ( 1 , 3 )
2 [4, 1] ( 1 , 4 )
3 [4, 0] ( 1 , 5 )
4 [3, 2] ( 2 , 3 )
5 [3, 1] ( 2 , 4 )
6 [3, 0] ( 2 , 5 )
7 [2, 1] ( 3 , 4 )
8 [2, 0] ( 3 , 5 )
9 [1, 0] ( 4 , 5 )
```