Shrijan Aryal - 7 months ago 36
Python Question

# Substring of a string from a point where character starts to repeat

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.

The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.

``````def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
if word[i] not in check_char:
temp_w += word[i]
check_char[word[i]] = i
else:
my_l.append(temp_w)
temp_w=''
i = check_char[word[i]]
check_char.pop(word[i])
i+=1
return my_l

print(checkrepeat('college'))
``````

One way of doing it is this:

``````def split_repeat(your_string):
temp_list = list()
temp_dict = dict()
for char in set(your_string):
temp_dict[char] = your_string.count(char)

for key in temp_dict:
if temp_dict[key] >= 2:
string_part = your_string.split(key)[0]+key
temp_list.append((string_part,your_string.lstrip(string_part)))

return temp_list
``````

But strip (and lstrip in this case) removes all repeat letters too (i.e. if you do: `'College'.lstrip('Col')` the output is: `'ege'` (missing second l).. As a more obvious example, if you do this: `'???????????!'.lstrip('?')` the output is: `'!'`

So a better way of doing it would be with the `replace` built in function, like this:

``````def split_repeat(your_string):
temp_list = list()
temp_dict = dict()
for char in set(your_string):
temp_dict[char] = your_string.count(char)

for key in temp_dict:
if temp_dict[key] >= 2:
string_part = your_string.split(key)[0]+key
temp_list.append((string_part,your_string.replace(string_part,'')))

return temp_list
``````

However, both of these split at the FIRST occurance of the character that happens twice.

So with input: `split_repeat('College')` the output is: `[('Colle', 'ge'), ('Col', 'lege')]`

But hopefully you can figure out the rest from here :)

EDIT:

## Ok, here's probably one of the nicest ways to do it:

``````def split_on_repeat(your_string):
temp_list = list()
temp_dict = dict()
for char in set(your_string):
temp_dict[char] = your_string.count(char)

for key in temp_dict:
if temp_dict[key] >= 2:
repeat_index = your_string.rfind(key)
temp_list.append((your_string[:repeat_index],your_string[repeat_index:]))

return temp_list
``````

use:

``````>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]
``````

And to leave you with some context about how indexing works in python:

`'Word'[:1] == 'Wo'`

`'Word'[-1] == 'd'`

`'Word'[:-1] == 'Wor'`

This indexing works for every object that is indexable.