voltas voltas - 5 months ago 20
Ruby Question

How to delete lines from multiple files

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).

If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.

list.txt:

car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........


Directory having multiple files: d:\new_work:

Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......


My code:

value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end


The issue is, values are not getting deleted as expected. Also,
text
is empty when I tried to print the value.

Answer Source

There are a number of things wrong with your code, and you're not safely handling your file changes.

Meditate on this untested code:

ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")

File.foreach('D:/mywork/list.txt') do |target|
  target = target.strip.sub(/,$/, '')

  ACCESS_FILES.each do |filename|
    new_filename = "#{filename}.new"
    old_filename = "#{filename}.old"

    File.open(new_filename, 'w') do |fileout|
      File.foreach(filename) do |line_in|
        fileout.puts line_in unless line_in[target]
      end
    end

    File.rename(filename, old_filename)
    File.rename(new_filename, filename)
    File.delete(old_filename)
  end
end
  • In your code you use:

    File.open('D:\\mywork\\list.txt').read
    

    instead, a shorter, and more concise and clear way would be to use:

    File.read('D:/mywork/list.txt')
    

    Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability.

    The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.

    Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for

    value.gsub!(/\r\n?/, "\n")
    value.each_line do |line|
      line.chomp!
    
  • While

    Dir.glob("D:/new_work/*-access.txt") do |fn|
    

    is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.

  • Again,

    text = File.read(fn)
    

    has scalability issues. Using foreach is a better solution. Again.

  • Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:

    replace = text.gsub(line.strip, "")
    
  • Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:

    File.open(fn, "w") { |file| file.puts replace }
    

    A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.

A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.