joaoricardo000 joaoricardo000 - 2 months ago 20
Python Question

Regex to remove bit signal noise spikes

I am dealing with RF signals that sometimes have noise spikes.

The input is something like this:

00000001111100011110001111100001110000001000001111000000111001111000


Before parsing the data in the signal, I need to remove the spike bits, that are 0's and 1's sequence with a lenght lower than (in this example) 3.

So basically I need to match
0000000111110001111000111110000111000000(1)000001111000000111(00)1111000


After match, I replace it by the bit before it, so a clean signal look like this:
00000001111100011110001111100001110000000000001111000000111111111000


So far I achieved this with two different Regex:

self.re_one_spikes = re.compile("(?:[^1])(?P<spike>1{1,%d})(?=[^1])" % (self._SHORTEST_BIT_LEN - 1))
self.re_zero_spikes = re.compile("(?:[^0])(?P<spike>0{1,%d})(?=[^0])" % (self._SHORTEST_BIT_LEN - 1))


Then I iterate on the matches and replace.

How can I do this with a single regex? And can I use regex to replace different sizes matches?

I tried something like this with no success:

re.compile("(?![\1])([01]{1,2})(?![\1])")

Answer
import re
THRESHOLD=3

def fixer(match):
    ones = match.group(0)
    if len(ones) < THRESHOLD: return "0"*len(ones)
    return ones

my_string = '00000001111100011110001111100001110000001000001111000000111001111000'
print(re.sub("(1+)",fixer,my_string))

if you want to also remove "spikes" of zeros

def fixer(match):
    items = match.group(0)
    if len(items) < THRESHOLD: return "10"[int(items[0])]*len(items)
    return items

print(re.sub("(1+)|(0+)",fixer,my_string))