I noticed that sometimes my ruby scripts that run reasonably well with processing small data crash due to no more memory left when you give them a large data set to process. For example, I have a long script and each minute it grows with a hundred or so megabytes of ram usage until it crashes when I give it large enough data to process.
So, question is, how to avoid memory leaks in ruby, what are the do's and don'ts? Any hints and tips to optimize ruby memory usage for long-running scripts?
How to make sure my ruby scripts didn't leak any memory?
The quick-fix to memory problems is often to spike in calls to
GC.start, this force-initiates the garbage collector. Sometimes Ruby gets very lazy about cleaning up garbage and it can accumulate to a dangerous degree.
It's sometimes the case you inadvertently create structures that are difficult to clean-up, that is wildly inter-linked things that are, when analyzed more deeply, not actually retained. This makes life harder for the garbage collector. For example, a deep Hash of Hash structures with lots and lots of strings can take a lot more work to liberate than a simple Array.
If you're having memory problems you'll want to pay attention to how much garbage you're producing when doing operations. Look for ways of collapsing things to remove intermediate products. For instance, the classic case is this:
s = '' 10.times do |i| s += i.to_s end
This creates a string of the form
01234... as a final product, but it also creates 10 other strings with the intermediate products. That's 11x as much garbage as this solution:
s = '' 10.times do |i| s << i.to_s end
That creates a single string and appends to it repeatedly. Technically the
to_s operation on a number also creates garbage, so that's another thing to keep in mind as well, that conversions aren't free. This is why you see symbols like
:name used in Ruby quite frequently, you pay the cost of those once and once only. Every string
"name" could be an independent object.