marcamillion marcamillion - 1 month ago 14
Ruby Question

How do I get this loop to write to CSV after each row iteration rather than after each file?

I have the following method:

csvs = Dir["#{@dir_name}/#{@state}/*.csv"]

csvs.each do |csv|
city = csv.split(/[\/]|.csv-updated|.csv/).last
new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
CSV.open(new_csv, "a+", write_headers: true, headers: ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]) do |new_csv_row|
CSV.foreach(csv, headers: true) do |row|
website = row['Website']
begin
page = YPCrawler::PageParser.new website
links = page.compile_all_links(website)
emails = page.compile_all_emails(links)
new_csv_row << (row << emails.join(","))
rescue
next
end
end
end
end


What happens though is that it doesn't write to the new CSV on each row processed, it only does it once it has processed an entire CSV file has been processed, as opposed to each row within that old CSV file. I assume it processes that old CSV file and stores the results in memory, and then when that CSV file is done it just dumps the entire thing from memory into the file. I don't particularly like this because CSV files have different lengths and I don't want to ever run out of memory since I am processing so many files.

I initially had the
CSV.open(new_csv)
and
CSV.foreach(csv)
, but the issue I had is that after every row it kept writing a header row, which is not what I wanted.

I just want the header row written once, at the top of the file, and then the row added appropriately.

What's the best way to approach this?

Answer

I think you can write the header explicitly. This is based on my understanding so far with our comments

headers = ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]
set_headers = true

csvs.each do |csv|
  city = csv.split(/[\/]|.csv-updated|.csv/).last
  new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
  CSV.open(new_csv, "a+") do |new_csv_row|
    new_csv_row << headers if set_headers
    set_headers = false
    CSV.foreach(csv, headers: true) do |row|
      website = row['Website']
      begin
        page = YPCrawler::PageParser.new website
        links = page.compile_all_links(website)
        emails = page.compile_all_emails(links)
        new_csv_row << (row << emails.join(","))
      rescue
        next
      end
    end
  end
end
Comments