I have a large file which has more than 1M lines, and another file which has the input-strings I need to use to get the lines matching in the large file.
I was able to do it this way:
File.open(strings_file, 'r') do |l|
File.open(large_file, 'r') do |line|
next if !line.include?(l)
First of all you'll have a geometric scaling problem if you get this wrong. If input file A has N lines and B has M lines then you'll need to do N*M tests to check for overlap. That can be impossibly slow.
Instead, pull in the input lines and stick them in something you can use for quick lookups:
require 'set' match_lines = Set.new(File.readlines(strings_file).map(&:chomp))
Then you can test very quickly here:
File.foreach(large_file) do |line| print line if (match_lines.include?(line.chomp)) end
chomp here to avoid failing to match if the last line in your match file doesn't have a newline at the end or if you're using CRLF encoding in one and LF in the other.