Richard Hall Richard Hall - 2 months ago 5
Python Question

How to randomly remove a percentage of items from a list

I have two lists of equal length, one is a data series the other is simply a time series. They represent simulated values measured over time.

I want to create a function that removes a set percentage or fraction from both lists but at random. I.e. if my fraction is 0.2, I want to randomly remove 20% of the items from both lists, but they have to be the same items (same index in each list) removed.

For example, let n = 0.2 (20% to be deleted)

a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

After the randomly removed 20%, they become

a_new = [0,1,3,4,5,6,8,9]
b_new = [0,1,9,16,25,36,64,81]

The relationship isn't as straightforward as the example, so I can't just perform this action on one list and then work out the second; they already exist as two lists. And they have to remain in the original order.


import random

a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

frac = 0.2  # how much of a/b do you want to exclude

# generate a list of indices to exclude. Turn in into a set for O(1) lookup time
inds = set(random.sample(list(range(len(a))), int(frac*len(a))))

# use `enumerate` to get list indices as well as elements. 
# Filter by index, but take only the elements
new_a = [n for i,n in enumerate(a) if i not in inds]
new_b = [n for i,n in enumerate(b) if i not in inds]