Austin L Austin L - 1 month ago 9
Ruby Question

find duplicates in Ruby array of arrays, sum their values, then combine them

I have a pipe delimited text file that i am looping through that looks like so:

123|ADAM JOHNSON|AAUA|||||||||||1||||
123||AAUA||||||||8675|90.0|90.0||||||||
444|STEVE SMITH|AAUA|||||||||||1|||||
444||AAUA||||||||2364|50.0|50.0|||||||
444||AAUA||||||||8453|50.0|50.0||||
567|ALLEN JONES|AAUA|||||||||||1||||||
567||AAUA||||||||6578|75.0|75.0||||||
567||AAUA||||||||1234|10.0|10.0||||
567||AAUA||||||||1234|15.0|15.0|||||


What I start with is grab the first, tenth and eleventh indexes of these rows and put them into an array of arrays like so:

CSV.foreach('data.txt', { :col_sep => '|' }) do |row|
if row[1].nil?
@group_array << [row[0], [row[10], row[11]]]
end
end


So I get something like:

[["123", ["8675", "90.0"]]
------------------
["444", ["2364", "50.0"]]
["444", ["8453", "50.0"]]
------------------
["567", ["6578", "75.0"]]
["567", ["1234", "10.0"]]
["567", ["1234", "15.0"]]]


What I am struggling with is looping through the arrays, finding groupings with the same first index (the 3 integer ids), looping through that then finding any duplicates in the second array with the same 4 integer id, then adding the third index floats then spitting out a final array with the duplicates removed and their values summed.

Expected output should look like:

[["0006310001", ["789663473", "90.0"]],
["0006410001", ["297103188", "50.0"]],
["0006410001", ["757854164", "50.0"]],
["0006610001", ["557493572", "75.0"]],
["0006610001", ["981894386", "25.0"]]]

Answer

I don't think your data item looks like [["123", ["8675", "90.0"]]] based on how it's created... more likely it looks like ["123", ["8675", "90.0"]]

In which case, this should do what you want...

group_array = @group_array.sort
result = []
save_3 = nil
save_4 = nil
total = 0.0
group_array.each do |group|
  if group[0] != save_3 || group[1][0] != save_4
    result << [save_3, [save_4, total.to_s]] if save_3
    total = 0.0
    save_3 = group[0]
    save_4 = group[1][0]
  end
  total += group[1][1].to_f
end
result << [save_3, [save_4, total.to_s]] if save_3