dariober dariober - 5 months ago 18
Java Question

Inconsistent behaviour of StrTokenizer to split string

I'm trying to split a string at a given delimiter allowing for delimiters to be inside quotes to be ignored. E.g.

"foo; bar; 'foo; bar'"


Should be slitted into 3 strings given delimiter ';' and quote char ':

foo
bar
foo; bar


I'm using StrTokenizer as below but it doesn't seem to work for
"foo; bar; 'foo; bar'"
but it does work for
"'foo; bar'; foo; bar;"


Can anyone explain what is wrong?

import org.apache.commons.lang3.text.StrTokenizer;
public class Main {
public static void main(String[] args) {

String x= "foo; bar; 'foo; bar'";

StrTokenizer tokens= new StrTokenizer(x, ';', '\'');

for (String token : tokens.getTokenArray()) {
System.out.println(token.trim());
}
// Prints:
// foo
// bar
// 'foo
// bar'

/* --------- */
// THIS IS OK:
x= "'foo; bar'; foo; bar";

tokens= new StrTokenizer(x, ';', '\'');

for (String token : tokens.getTokenArray()) {
System.out.println(token.trim());
}
// Prints:
// foo; bar
// foo
// bar
}
}

Answer

It looks like by default quoted area can't be preceded by any character (even space) except delimiter (so ; 'quote' is not OK, but ;'qupte' is fine) - (which is little strange because space between end of quote and delimiter doesn't seem to cause any problem, which may suggest that this may be a bug).

Explicitly setting characters which should be trimmed seems to solve this problem (also you no longer need to add trim() in your printing statements):

StrTokenizer tokens = new StrTokenizer(x, ';', '\'');
tokens.setTrimmerMatcher(StrMatcher.spaceMatcher());// <- add this line

To trim on: space, tab, newline and formfeed use StrMatcher.splitMatcher()