GrantS GrantS - 1 month ago 22
Python Question

Python set with the ability to pop a random element

I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:

import random
random.sample(mySet, 1)


But this is quite slow for large sets (it runs in O(n) time).

Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):

for e in mySet:
break
# e is now an element from mySet


I coded my own rudimentary class which has constant time lookup, deletion, and random values.

class randomSet:
def __init__(self):
self.dict = {}
self.list = []

def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)

def addIterable(self, item):
for a in item:
self.add(a)

def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]

def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]

def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue


Are there any better implementations for this, or any big improvements to be made to this code?

Answer

I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)

discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:

def popRandom(self):
    item = self.getRandom()
    self.discard(item)
    return item

That way you don't have to maintain two separate item removal methods.

Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:

def discard(self, item):
    if item in self.dict:
        index = self.dict[item]
        self.list[index], self.list[-1] = self.list[-1], self.list[index]
        self.dict[self.list[index]] = index
        del self.list[-1]                    # or in one line:
        del self.dict[item]                  # del self.dict[self.list.pop()]