alex.l alex.l - 2 months ago 12
Python Question

First Unique Character in a String

Given a string, find the first non-repeating character in it and return its index. If it doesn't exist, return -1.

first_unique('leetcode') # 0
first_unique('loveleetcode') # 2


I came up with the following solution. How can I make it more efficient for very long input strings?

def first_unique(self, s):
if s == '':
return -1

for item in s:
if s.count(item) == 1:
return s.index(item)
break

return -1

BPL BPL
Answer

Your version isn't bad for few cases with "nice" strings... but using count is quite expensive for long "bad" strings, I'd suggest you cache items, for instance:

def f1(s):
    if s == '':
        return -1
    for item in s:
        if s.count(item) == 1:
            return s.index(item)
            break
    return -1


def f2(s):
    cache = set()

    if s == '':
        return -1
    for item in s:
        if item not in cache:
            if s.count(item) == 1:
                return s.index(item)
            else:
                cache.add(item)

    return -1

import timeit
import random
import string

random.seed(1)

K, N = 500, 100000
data = ''.join(random.choice(string.ascii_uppercase + string.digits)
               for _ in range(K))

print(
    timeit.timeit('f1(data)', setup='from __main__ import f1, data', number=N))
print(
    timeit.timeit('f2(data)', setup='from __main__ import f2, data', number=N))

The results on my laptop are:

32.05926330029437
4.267771588590406

The version using cache gives you 8x speed up factor vs yours wich is using count function all the time. So, my general advice would be... cache as much as possible whether it's possible

EDIT:

I've added Patrick Haugh version to the benchmark and it gave 10.92784585620725

EDIT2:

I've added Mehmet Furkan Demirel version to the benchmark and it gave 10.325440507549331

EDIT3:

I've added wim version to the benchmark and it gave 12.47985351744839

CONCLUSION:

I'd use the version i've proposed initially using a simple cache without relying on Python counter modules, it's not necessary (in terms of performance)

Comments