Dimitri de Ruiter Dimitri de Ruiter - 28 days ago 11
Ruby Question

Rails 5 - set value on self based on matching fields with a Regex

In the app that I am building to learn Ruby and Rails, I have trouble getting below to work.

Desired result



when the content of the field
self.extracted_data
(here self is object
Document
) contains the bank account number (
bank_account
) of a business partner (
BusinessPartner
), the sender for the document (
self.sender_id
) needs to equal the BusinessPartner.

What I have so far:

BusinessPartner.active.each do |business_partner|
unless business_partner == self.receiver_id
if self.extracted_data =~ /\s#{Regexp.escape(business_partner.bank_account)}?\s/i # need to fix the RGX
self.sender = business_partner
self.name = "match: " + business_partner.id.to_s + /\s#{Regexp.escape(business_partner.bank_account)}?\s/i.to_s # to see RGX used
else
self.sender = nil
self.name = "NO match: " + business_partner.id.to_s + /\s#{Regexp.escape(business_partner.bank_account)}?\s/i.to_s # to see RGX used
end
end
end


It always gives me NO MATCH where I do have 100% matching records for business partners. I have been studying the pickaxe book, rails doc etc. for hours now and can find the solution. All help / advice welcome.

p.s. I could DRY the regex into a variable yet it is used multiple times only temporarily.

update



sample data for business partners
enter image description here

sample data for extracted_data

could include the bank-account...


  • enclosed in whitespace eg: ' NL15 INGB 0660 3125 06 '

  • enclosed in whitespace and a dot (.) eg: ' GB99 RBS1 0469 7788 99.'

  • enclosed in brackets () eg: (NL15 INGB 0660 3125 06)

  • although not allowed by the banks, could have special characters; typically dot (.) or dash (-)

  • or like so: ' 19.83.94.527 ' (very uncommon; no need to cater.



Note: bank account should adhere to IBAN formatting rules. These will be applied to the business_partner.bank_account field for data quality; yet what is in the extracted_data depends on what it extracted from the file (pdf) attached to the document record.

Answer

You may replace the \s whitespace patterns with word boundaries \b to avoid requiring whitespace around the pattern (word boundaries are zero-width assertions, and they only match locations in a string, so they are safe to use in the extraction scenario, similarly to lookarounds), and since there are whitespace symbols in the original string, you may just remove them with .gsub(/\s+/, '') for the sake of regex checking:

if self.extracted_data.gsub(/\s+/, '') =~ /\b#{Regexp.escape(business_partner.bank_account)}?\b/i
                       ^^^^^^^^^^^^^^^    ^^^                                               ^^^

See more about word boundaries on the Word Boundaries regular-expressions.info Web page.