Atith Atith - 4 months ago 33
Ruby Question

Errno::ENOMEM: Cannot allocate memory - cat

I have a job running on production which process xml files.
xml files counts around 4k and of size 8 to 9 GB all together.

After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:


Errno::ENOMEM: Cannot allocate memory


on
cat
(Backtick) command.

Below are few details:


  • System Memory - 4 GB

  • Swap - 2 GB

  • Ruby : 1.9.3p286



Files are processed using
nokogiri
and
saxbuilder-0.0.8
.

Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).

Below is the code which will merge the output files to a single file

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
`cat #{file} >> #{final_output_file}`
}


I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail.
It always fails on
cat
command.

I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.

Please let me know your opinion and alternative to this.

Answer

So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.

If you don't mind loosing some speed, you can merge the files in ruby, with small buffers. This avoids spawning a shell, and you can control the buffer size.

This is untested but you get the idea :

buffer_size = 4096
output_file = File.open(final_output_file, 'w')

Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
  f = File.open(file)
  while buffer = f.read(buffer_size)
    output_file.write(buffer)
  end
  f.close
end