Mrinal Kamboj Mrinal Kamboj - 19 days ago 6
C# Question

Regular Expression for specific requirements

Following is my collection of strings, which refer to path from begin to end, each element like

ABC
is a city terminal:

"ABC-DEF-MNO-JKL-LOO"
"BYT-JKU-PLO-MNO"
"DEF-BYT-IOT-POC-LOO"
"LMN-RTX-PQS-JYY"
"LMN-PQS-IRJ"


I have developed the following Regex patterns to take care of business requirement:

Requirement1 -
Start with ABC or DEF, pass via MNO or BYT, End with LOO,JYY,IRJ


Pattern 1 - `@"(^ABC|^DEF).*(MNO|BYT).*(LOO$|JYY$|IRJ$)";`
Result 1 - "ABC-DEF-MNO-JKL-LOO", "DEF-BYT-IOT-POC-LOO"


Requirement2 -
Start with ABC or DEF, pass via MNO or BYT, exclude route with IOT and End with LOO,JYY,IRJ


Expected Result - "ABC-DEF-MNO-JKL-LOO", since the other route has IOT

Expected Pattern - `@"(^ABC|^DEF).*(MNO|BYT).*^(IOT).*(LOO$|JYY$|IRJ$)";`


but this one fails to provide any result

Successful Pattern - `@"(^ABC|^DEF).*(MNO|BYT.*^(IOT)).*(LOO$|JYY$|IRJ$)";`


However I am not convinced, this is the correct way to achieve it, can anyone help in understanding:


  • Why Expected Pattern doesn't helps in yielding correct result, I cannot understand what's wrong in it

  • Suggest a better Regular expression to achieve the same result



Edit 1:

Based on the response provided by the @Sebastian, following pattern also succeed:

@"^(?:ABC|DEF).*(?:MNO|BYT)(?!.*IOT).*(?:LOO|JYY|IRJ)$"


but following pattern fails, when there's just extra
.*


@"^(?:ABC|DEF).*(?:MNO|BYT).*(?!.*IOT).*(?:LOO|JYY|IRJ)$"

Answer

You could use ^(?:ABC|DEF)(?!.*IOT).*(?:MNO|BYT).*(?:LOO|JYY|IRJ)$ to meet your second requirement. This uses a negative lookahead to avoid matching, if IOT is present. The rest is taken from your pattern, just making the groups non capturing and moving the anchors outside the groups.

Your pseudo successful pattern uses the following (MNO|BYT.*^(IOT)) which matches either MNO or BYT.*^(IOT), so there is no check for IOT if MNO is present in your string. Also ^(IOT) matches start of the string + IOT, which will never be present inside the string.