theGreenCabbage theGreenCabbage - 3 months ago 10
Ruby Question

Performing union of two arrays with custom rules

I have two arrays

b = ["John Roberts", "William Koleva", "Lili Joe", "Victoria Jane", "Allen Thomas"]

a = ["Jon Roberts", "Wil Koleva", "Lilian Joe", "Vic Jane", "Al Thomas"]


Currently I am using the union operator on these two arrays, like this:
a | b
. When combined, even though the names in each array are the "same" name (they're just using the shortened version of the name), it will duplicate my names.

My proposed solution to this is simply choose the first occurrence of first initial + last name as the name to perform the union on, however, I don't recall there being any methods in Ruby that can perform such an operation.

So the result of
some_method(a | b)
will return
c
which is just:

["John Roberts", "William Koleva", "Lili Joe", "Victoria Jane", "Allen Thomas"]


I am wondering how I could go about achieving this?

Answer
b = ["John Roberts", "William Koleva", "Lili Joe", "Victoria Jane", "Allen Thomas"]
a = ["Jon Roberts", "Wil Koleva", "Lilian Joe", "Vic Jane", "Al Thomas"]

r = /
    \s           # match a space
    [[:alpha:]]+ # match > 0 alphabetic characters
    \z           # match end of string
    /x           # free-spacing regex definition mode

(b+a).uniq { |str| [str[0], str[r]] }
  #=> ["John Roberts", "William Koleva", "Lili Joe", "Victoria Jane", "Allen Thomas"]

This uses the form of the method Array#uniq that employs a block.

You may alternatively write (b|a).uniq...

The steps are as follows.

c = b+a
  # => ["John Roberts", "William Koleva", "Lili Joe", "Victoria Jane", "Allen Thomas",
  # "Jon Roberts", "Wil Koleva", "Lilian Joe", "Vic Jane", "Al Thomas"] 

The first element of c passed to the block is

str = c.first
  #=> "John Roberts"

so the block calculation is

[str[0], str[r]]
  #=> ["J", " Roberts"]

The calculations are similar for all the other elements of c. The upshot is that

c.uniq { |str| [str[0], str[r]] }

is equivalent to selecting the first elements of c, when converted to [<first name initial>, <last name>], that match an element of the array

[["J", "Roberts"], ["W", "Koleva"], ["L", "Joe"], ["V", "Jane"], ["A", "Thomas"],
 ["J", "Roberts"], ["W", "Koleva"], ["L", "Joe"], ["V", "Jane"], ["A", "Thomas"]].uniq
  #=> [["J", "Roberts"], ["W", "Koleva"], ["L", "Joe"], ["V", "Jane"], ["A", "Thomas"]]