speedRS speedRS - 2 months ago 8
Java Question

How do I parse delimited rows of text with differing field counts in to objects, while allowing for extension?

An example is as follows:

SEG1|asdasd|20111212|asdsad
SEG2|asdasd|asdasd
SEG3|sdfsdf|sdfsdf|sdfsdf|sdfsfsdf
SEG4|sdfsfs|


Basically, each
SEG*
line needs to be parsed into a corresponding object, defining what each of those fields are. Some, such as the third field in
SEG1
will be parsed as a
Date
.

Each object will generally stay the same but there may be instances in which an additional field may be added, like so:

SEG1|asdasd|20111212|asdsad|12334455


At the moment, I'm thinking of using the following type of algorithm:

List<String> segments = Arrays.asList(string.split("\r"); // Will always be a CR.
List<String> fields;
String fieldName;
for (String segment : segments) {
fields = Arrays.asList(segment.split("\\|");
fieldName = fields.get(0);
SEG1 seg1;
if (fieldName.compareTo("SEG1") == 0) {
seg1 = new Seg1();
seg1.setField1(fields.get(1));
seg1.setField2(fields.get(2));
seg1.setField3(fields.get(3));
} else if (fieldName.compareTo("SEG2") == 0) {
...
} else if (fieldName.compareTo("SEG3") == 0) {
...
} else {
// Erroneous/failure case.
}
}


Some fields may be optional as well, depending on the object being populated. My concern is if I add a new field to a class, any checks that use the expect field count number will also need to be updated. How could I go about parsing the rows, while allowing for new or modified field types in the class objects to populate?

Answer

If you can define a common interface for all to be parsed classes I would suggest the following:

interface Segment {}

class SEG1 implements Segment
{
    void setField1(final String field){};
    void setField2(final String field){};
    void setField3(final String field){};
}

enum Parser {
    SEGMENT1("SEG1") {
        @Override
        protected Segment parse(final String[] fields)
        {
            final SEG1 segment = new SEG1();
            segment.setField1(fields[0]);
            segment.setField1(fields[1]);
            segment.setField1(fields[2]);
            return segment;
        }
    },        
    ...
    ;

    private final String name;

    private Parser(final String name)
    {
        this.name = name;
    }

    protected abstract Segment parse(String[] fields);

    public static Segment parse(final String segment)
    {
        final int firstSeparator = segment.indexOf('|');

        final String name = segment.substring(0, firstSeparator);
        final String[] fields = segment.substring(firstSeparator + 1).split("\\|");

        for (final Parser parser : values())
            if (parser.name.equals(name))
                return parser.parse(fields);

        return null;
    }
}

For each type of segment add an element to the enum and handle the different kinds of fields in the parse(String[])method.

Comments