Giorgi Nakeuri Giorgi Nakeuri - 13 days ago 4
C# Question

Complex regex: key-value pairs

I am failing to figure out the correct regex for this type of data:


Phone-Work: 1111111111 Phone-Fax Work: 222222222 Phone-General:
(333) 333-3333 Email-: email@email.com


Desired result is:

Col1 Col2 Col3
Phone Work 1111111111
Phone Fax Work 222222222
Phone General (333) 333-3333
Email null email@email.com


The key consists of two parts(the second may be missing):
Phone-Work:
,
Email-:


There can be 4 types of keys:
Phone-
,
Email-
,
User ID-
,
Web address-


I am failing to figure out how to create a regex that will take the
value
part and stop before new
key
.

Here is what I am trying with some data:

https://regex101.com/r/weEc3A/1

Answer

You may use a solution like

(?si)(Phone|Email|User ID|Web address)-([^:]*):\s*((?:(?!(?:Phone|Email|User ID|Web address)-).)*)

that is an equivalent of

(?si)(Phone|Email|User ID|Web address)-([^:]*):\s*(.*?)(?=(?:Phone|Email|User ID|Web address)-|$)

See the regex demo

Details:

  • (Phone|Email|User ID|Web address)- - matches the possible starting values followed with - capturing that into Group 1
  • ([^:]*) - captures zero or more chars other than : into Group 2
  • :\s* - a colon followed with zero or more whitespaces
  • ((?:(?!(?:Phone|Email|User ID|Web address)-).)*) - Group 3 capturing any char (.) that is not starting the sequence matched with (?:Phone|Email|User ID|Web address)- pattern.

Since (?s) modifier is used, the . matches a newline, too. In C#, you may use RegexOptions.Singleline flag instead of this inline option. The (?i) is the inline equivalent of the RegexOptions.IgnoreCase flag. When combined, the inline modifiers can be written inside one pair of parentheses: (?si).

Comments