yusuf yusuf - 1 month ago 5
Python Question

optimization for faster calculation on python defaultdict

I have such a script;

for b in range(len(xy_alignments.keys())):
print str(b) + " : " + str(len(xy_alignments.keys()))
x = xy_alignments.keys()[b][0]
y = xy_alignments.keys()[b][1]
yx_prob = yx_alignments[(y,x)] / x_phrases[x]
xy_prob = xy_alignments[(x,y)] / y_phrases[y]
line_str = x + "\t" + y + "\t" + str(yx_prob) + "\t" + str(xy_prob) + "\n"
of.write(line_str.encode("utf-8"))
of.close()


xy_alignments
,
yx_alignments
,
x_phrases
, and
y_phrases
are
python defaultdict variables which involve millions of keys.

When I run the loop above, it runs damn slowly.

Do python lovers have a suggestion to make it fast?

Thanks,

Answer

Here's a more idiomatic version, that should also be faster.

for (x, y), xy_alignment in xy_alignments.iteritems():
    yx_prob = yx_alignments[(y, x)] / x_phrases[x]
    xy_prob = xy_alignment / y_phrases[y]
    of.write(b'%s\t%s\t%s\t%s\n' % (x, y, yx_prob, xy_prob))

This

  • saves the key() calls which create new lists every time,
  • saves one dict lookup by using iteritems(),
  • saves string allocations by using string formatting, and
  • saves the encode() call because all output is in the ascii range anyway.