Mateusz Mateusz - 3 months ago 5
Java Question

Using regex in Java to extract string after first comma and before two capital letters and a comma

I am currently working with strings that follow this format:

4,Matt, Hopkins,MI,5.75,Wood,33.0,2.25,2.1,2016-09-02,74.25,69.3,8.254125,151.804125


and I am trying to use regex to extract all the words and integers as separate strings ( as in MI, Wood, 33.0 and so forth) with one exception: I want to treat the part that follows the first comma as a single string, until we get to the all caps - so the regex would extract this:

[4] [Matt, Hopkins] [MI] [5.75] [Wood] and so forth.


Note that the name part can have no commas at all i.e. [Hopkins] or more than one i.e. [Matt, Jr., Hopkins]. The all caps field desribes a state and so always follows the same format.

I do not understand Regex well enough to do that - so far I only came up with

[a-zA-Z(?:\d*\.)?\d+-]+


which handles all fields alright, except the name.

Answer

Using regex might just make things harder for yourself here.

This looks like CSV data. You can use a CSV library to correctly parse this into individual fields (*):

String[] fields = YourCsvLibrary.parseRow(string);  // or string.split(","), maybe.

and then recombine the fields as appropriate. For example, your regex's logic can be expressed via the following code:

String[] output = Arrays.copyOfRange(fields, 1, fields.length);
output[0] = fields[0];
output[1] = fields[1] + "," + fields[2];

Ideone demo


(*) String.split(",") might work, provided the field data doesn't contain quotes, commas, newlines, etc.