Chaos Chaos - 5 months ago 31
Python Question

Find missing data indices using python

What is the optimum way to return indices where 1-d array has missing data. The missing data is represented by zeros. The data may be genuinely zero but not missing. We only want to return indices where data is zero for more than or equal to 3 places at a time. For example for array [1,2,3,4,0,1,2,3,0,0,0,1,2,3] the function should only return indices for second segment where there are zeros and not the first instance.

This is actually an interview question :) challenge is to do most effeciently in one line


Keep track of the count of zeros in the current run. Then if a run finishes that has at least three zeros calculate the indexes.

def find_dx_of_missing(a):
    runsize = 3 # 3 or more, change to 4 if your need "more than 3"
    zcount = 0
    for i, n in enumerate(a):
        if n == 0:
            zcount += 1
            if zcount >= runsize:
                for j in range(i - zcount, i):
                    yield j
            zcount = 0
    if zcount >= runsize: # needed if sequence ends with missing
        i += 1
        for j in range(i - zcount, i):
            yield j


>>> a = [1,2,3,4,0,1,2,3,0,0,0,1,2,3]
>>> list(find_dx_of_missing(a))
[8, 9, 10]

>>> a = [0,0,0,3,0,5,0,0,0,0,10,0,0,0,0,0]
>>> list(find_dx_of_missing(a))
[0, 1, 2, 6, 7, 8, 9, 11, 12, 13, 14, 15]

Edit: Since you need a one liner here are two candidates assuming a is your list and n is the smallest run of zeros that count as missing data:

[v for vals in (list(vals) for iszeros, vals in itertools.groupby(xrange(len(a)), lambda dx, a=a: a[dx]==0) if iszeros) for v in vals if len(vals) >= n]


sorted({dx for i in xrange(len(a)-n+1) for dx in xrange(i, i+n) if set(a[i:i+n]) == {0}})