Yash Agarwal - 1 year ago 99
Python Question

# Removing duplicates if paired adjacently

Deleting any pair of adjacent letters with same value. For example, string "aabcc" would become either "aab" or "bcc" after operation.

Sample input = aaabccddd

Sample output = abd

Confused how to iterate the list or the string in a way to match the duplicates and removing them, here is the way I am trying and I know it is wrong.

``````S = input()
removals = []

for i in range(0, len(S)):
if i + 1 >= len(S):
break

elif S[i] == S[i + 1]:
removals.append(i)
# removals is to store all the indexes that are to be deleted.
removals.append(i + 1)
i += 1
print(i)
Array = list(S)
set(removals)    #removes duplicates from removals

for j in range(0, len(removals)):
Array.pop(removals[j])    # Creates IndexOutOfRange error
``````

This is a problem from Hackerrank: Super Reduced String

Removing paired letters can be reduced to reducing runs of letters to an empty sequence if there are an even number of them, 1 if there are an odd number. `aaaaaa` becomes empty, `aaaaa` is reduced to `a`.

To do this on any sequence, use `itertools.groupby()` and count the group size:

``````# only include a value if their consecutive count is odd
[v for v, group in groupby(sequence) if sum(1 for _ in group) % 2]
``````

then repeat until the size of the sequence no longer changes:

``````prev = len(sequence) + 1
while len(sequence) < prev:
prev = len(sequence)
sequence = [v for v, group in groupby(sequence) if sum(1 for _ in group) % 2]
``````

However, since Hackerrank gives you text it'd be faster if you did this with a regular expression:

``````import re

even = re.compile(r'(?:([a-z])\1)+')

prev = len(text) + 1
while len(text) < prev:
prev = len(text)
text = even.sub(r'', text)
``````

`[a-z]` in a regex matches a lower-case letter, (..)`groups that match, and`\1`references the first match and will only match if that letter was repeated.`(?:...)+`asks for repeats of the same two characters.`re.sub()` replaces all those patterns with empty text.

The regex approach is good enough to pass that Hackerrank challenge.