Is this considered an efficient way to check if two strings are permutations of eachother? What would be the difference between using two dictionaries instead? Is Counter more optimized for python?
I guess I should clarify, the reason for me posting this is this comment : Check if two unordered lists are equal mentions that it's more efficient to check if two strings are the same by using sort(s1)==sort(s2).Since this is o(nlog(n)), you would assume Counter would have a better run time.
In the other question you referenced, J.F. Sebastian theorized that using
sort would be faster than using
Counter in nearly all cases, but he wisely noted that if you really care you should measure: "In practice it might always be faster then
collections.Counter() (despite asymptotically
O(n) time being better then
.sort()). Measure it, if it is important."
Such a measurement is shown below. The high points:
Up to list sizes of roughly
sort was faster.
That advantage steadily dwindles as the list size grows.
These experiments use lists that have many discrete values. The benchmarks might be different if the lists had a moderately high percentage of repeated values.
These measurements suggest that the overhead of creating a
Counter is generally larger than the cost of just sorting the data in place. One way to think about this is to view the overhead of the
Counter creation relative to the
log(n) portion of the
sort approach. Not until
n gets fairly large does the algorithmic advantage of the
Counter data structure begin to pay off enough to justify that overhead.
But remember: for small data sizes, the speed advantage might be irrelevant -- unless you are repeating the operation at high volume. In this light, the "algorithmically correct" approach (use a
Counter) might still be better from a code design point of view for most use cases -- because it's faster if the data is large and it's fast-enough if the data is small.
import timeit, sys, random from collections import Counter def prep(): random.shuffle(xs) random.shuffle(ys) def use_counter(): prep() return Counter(xs) == Counter(ys) def use_sort(): prep() xs.sort() ys.sort() return xs == ys experiments = [ (3, 10, 100000), (3, 100, 10000), (3, 1000, 1000), (3, 10000, 100), (3, 100000, 10), (3, 1000000, 1), (1, 10000000, 1), ] for e in experiments: repeat, list_size, timeit_n = e xs = list(range(list_size)) ys = list(range(list_size)) r1 = timeit.repeat(use_counter, repeat = repeat, number = timeit_n) r2 = timeit.repeat(use_sort, repeat = repeat, number = timeit_n) print print e print 'total ', sum(r1), sum(r2) print 'm1/m2 ', min(r1) / min(r2)
Example output (ratio > 1 means
Counter is slower):
(3, 10, 100000) total 5.06751918793 2.15432405472 m1/m2 2.34850470872 (3, 100, 10000) total 3.16299915314 2.06651735306 m1/m2 1.52879303981 (3, 1000, 1000) total 3.017786026 2.42989587784 m1/m2 1.24086325316 (3, 10000, 100) total 3.06426525116 2.74061489105 m1/m2 1.11802891855 (3, 100000, 10) total 3.66198205948 3.35467290878 m1/m2 1.1028159291 (3, 1000000, 1) total 5.19361901283 5.08777713776 m1/m2 1.03125948765 (1, 10000000, 1) total 20.0118789673 24.6061840057 m1/m2 0.813286569044