Daler Daler -4 years ago 81
Java Question

Regex expression for comma and dash seperated text of items

I do have a Java Web Application, where I get some inputs from the user. Once I got this input I have to parse it and the

parsing
part depends on what kind of input I'll get. I decided to use the
Pattern
class of
java
for some of predefined user inputs.

So I need the last 2 regex patterns:

a)Enumaration:

input can be - A03,B24.1,A25.7

The simple way would be to check if there are a comma in there (
[^,]+
) but it will end up with a lot of updates in to parsing function, which I would like to avoid. So, in addition to comma it should check if it starts with


  • letter

  • minimum 3 letters (combined with numbers)

  • can have one dot in the word

  • minimum 1 comma (updated it)



b) Mixed

input can be A03,B24.1-B35.5,A25.7

So all of what Enumuration part got, but with addition that it can have a dash minimum one.

I've tried to use multiple online regex generators but didnt get it correct. Would be much appreciated if you can help.

Here is what I got if its B24.1-B35.5 if its just a simple range.

"='.{1}\\d{0,2}-.{1}\\d{0,2}'|='.{1}\\d{1,2}.\\d{1,2}-.{1}\\d{1,2}.\\d{1,2}'";


Edit1: Valid and Invalid inputs

for a)Enumaration


  • A03,B24.1,A25.7 Valid

  • A03,B24.1 Valid

  • A03,B24.1-B25.1 -Invalid because in this case (enumaration) it should not contain dash

  • A03 invalid because no comma

  • A03,B24.1 - Valid

  • A03 Invalid



for b)Mixed

everything that a enumeration has with addition that it can have dash too.

Answer Source

You can use this regex for (a) Enumeration part as per your rules:

[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+

Rules:

  • Verifies that each segment starts with a letter
  • Minimum of three letters or numbers [A-Za-z][A-Za-z0-9]{2,}
  • Optionally followed by decimal . and one or more alphabets and numbers i.e (?:\.[A-Za-z0-9]{1,})?
  • Same thing repeated, and seperated by a comma ,. Also must have atleast one comma so using + i.e (?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
  • ?: to indicate non-capturing group
  • Using [A-Za-z0-9] instead of \w to avoid underscores

Regex101 Demo

For (b) Mixed, you haven't shared too many valid and invalid cases, but based on my current understanding here's what I have:

[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:[,-][A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+

Note that , from previous regex has been replaced with [,-] to allow - as well!

Regex101 Demo

// Will match
A03,B24.1-B35.5,A25.7
A03,B24.1,A25.7
A03,B24.1-B25.1

Hope this helps!

EDIT: Making sure each group starts with a letter (and not a number) Thanks to @diginoise and @anubhava for pointing out! Changed [A-Za-z0-9]{3,} to [A-Za-z][A-Za-z0-9]{2,}

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download