Mateusz Mateusz - 1 year ago 57
Java Question

Using regex in Java to extract string after first comma and before two capital letters and a comma

I am currently working with strings that follow this format:

4,Matt, Hopkins,MI,5.75,Wood,33.0,2.25,2.1,2016-09-02,74.25,69.3,8.254125,151.804125

and I am trying to use regex to extract all the words and integers as separate strings ( as in MI, Wood, 33.0 and so forth) with one exception: I want to treat the part that follows the first comma as a single string, until we get to the all caps - so the regex would extract this:

[4] [Matt, Hopkins] [MI] [5.75] [Wood] and so forth.

Note that the name part can have no commas at all i.e. [Hopkins] or more than one i.e. [Matt, Jr., Hopkins]. The all caps field desribes a state and so always follows the same format.

I do not understand Regex well enough to do that - so far I only came up with


which handles all fields alright, except the name.

Answer Source

Using regex might just make things harder for yourself here.

This looks like CSV data. You can use a CSV library to correctly parse this into individual fields (*):

String[] fields = YourCsvLibrary.parseRow(string);  // or string.split(","), maybe.

and then recombine the fields as appropriate. For example, your regex's logic can be expressed via the following code:

String[] output = Arrays.copyOfRange(fields, 1, fields.length);
output[0] = fields[0];
output[1] = fields[1] + "," + fields[2];

Ideone demo

(*) String.split(",") might work, provided the field data doesn't contain quotes, commas, newlines, etc.