Shrijan Aryal Shrijan Aryal - 27 days ago 10
Python Question

Substring of a string from a point where character starts to repeat

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.

The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.

def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
if word[i] not in check_char:
temp_w += word[i]
check_char[word[i]] = i
else:
my_l.append(temp_w)
temp_w=''
i = check_char[word[i]]
check_char.pop(word[i])
i+=1
return my_l

print(checkrepeat('college'))

Answer

One way of doing it is this:

def split_repeat(your_string):
    temp_list = list()
    temp_dict = dict()
    for char in set(your_string):
        temp_dict[char] = your_string.count(char)

    for key in temp_dict:
        if temp_dict[key] >= 2:
            string_part = your_string.split(key)[0]+key
            temp_list.append((string_part,your_string.lstrip(string_part)))

    return temp_list

But strip (and lstrip in this case) removes all repeat letters too (i.e. if you do: 'College'.lstrip('Col') the output is: 'ege' (missing second l).. As a more obvious example, if you do this: '???????????!'.lstrip('?') the output is: '!'

So a better way of doing it would be with the replace built in function, like this:

def split_repeat(your_string):
    temp_list = list()
    temp_dict = dict()
    for char in set(your_string):
        temp_dict[char] = your_string.count(char)

    for key in temp_dict:
        if temp_dict[key] >= 2:
            string_part = your_string.split(key)[0]+key
            temp_list.append((string_part,your_string.replace(string_part,'')))

    return temp_list

However, both of these split at the FIRST occurance of the character that happens twice.

So with input: split_repeat('College') the output is: [('Colle', 'ge'), ('Col', 'lege')]

But hopefully you can figure out the rest from here :)

EDIT:

Ok, here's probably one of the nicest ways to do it:

def split_on_repeat(your_string):
    temp_list = list()
    temp_dict = dict()
    for char in set(your_string):
        temp_dict[char] = your_string.count(char)

    for key in temp_dict:
        if temp_dict[key] >= 2:
            repeat_index = your_string.rfind(key)
            temp_list.append((your_string[:repeat_index],your_string[repeat_index:]))

    return temp_list

use:

>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]

And to leave you with some context about how indexing works in python:

'Word'[:1] == 'Wo'

'Word'[-1] == 'd'

'Word'[:-1] == 'Wor'

This indexing works for every object that is indexable.