Nick D Nick D - 1 year ago 57
Ruby Question

How can I generate a percentage for a regex string match in Ruby?

I'm trying to build a simple method to look at about 100 entries in a database for a last name and pull out all the ones that match above a specific percentage of letters. My current approach is:

  1. Pull all 100 entries from the database into an array

  2. Iterate through them while performing the following action

  3. Split the last name into an array of letters

  4. Subtract that array from another array that contains the letters for the name I am trying to match which leaves only the letters that weren't matched.

  5. Take the size of the result and divide by the original size of the array from step 3 to get a percentage.

  6. If the percentage is above a predefined threshold, push that database object into a results array.

This works, but I feel like there must be some cool ruby/regex/active record method of doing this more efficiently. I have googled quite a bit but can't find anything.

Answer Source

To comment on the merit of the measure you suggested would require speculation, which is out-of-bounds at SO. I therefore will merely demonstrate how you might implement your proposed approach.


First define a helper method:

class Array
  def difference(other)
    h = other.each_with_object( { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }

In short, if

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]


a - b           #=> [1]


a.difference(b) #=> [1, 3, 2, 2]

This method is elaborated in my answer to this SO question. I've found so many uses for it that I've proposed it be added to the Ruby Core.

The following method produces a hash whose keys are the elements of names (strings) and whose values are the fractions of the letters in the target string that are contained in each string in names.

def target_fractions(names, target)
  target_arr = target.downcase.scan(/[a-z]/)
  target_size = target_arr.size
  names.each_with_object({}) do |s,h|
    s_arr = s.downcase.scan(/[a-z]/)
    target_remaining = target_arr.difference(s_arr)
    h[s] = (target_size-target_remaining.size)/target_size.to_f


target = "Jimmy S. Bond"

and the names you are comparing are given by

names = ["Jill Dandy", "Boomer Asad", "Josefine Simbad"]


target_fractions(names, target)
  #=> {"Jill Dandy"=>0.5, "Boomer Asad"=>0.5, "Josefine Simbad"=>0.8} 


For the above values of names and target,

target_arr = target.downcase.scan(/[a-z]/)
  #=> ["j", "i", "m", "m", "y", "s", "b", "o", "n", "d"] 
target_size = target_arr.size
  #=> 10

Now consider

s = "Jill Dandy"
h = {}


s_arr = s.downcase.scan(/[a-z]/)
  #=> ["j", "i", "l", "l", "d", "a", "n", "d", "y"]
target_remaining = target_arr.difference(s_arr)
  #=> ["m", "m", "s", "b", "o"]

h[s] = (target_size-target_remaining.size)/target_size.to_f
  #=> (10-5)/10.0 => 0.5
h #=> {"Jill Dandy"=>0.5}

The calculations are similar for Boomer and Josefine.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download