Chaklader Chaklader - 2 months ago 12
Java Question

How to split a String with ignoring the keywords?

I need to split a String base on the white spaces but I need to ignore some keywords may contain in those strings. Those keywords contains white space. For example, I have a String as following,

String testCase = "The patient is currently being treated for Diabetes with Thiazide diuretics";


Now, I needs to String to be splitted but need the
Thiazide diuretics
as whole word after getting the array using the following code,

String[] array = testCase.split(" ");


And, the result needs to be as following,

The
patient
is
currently
being
treated
for
Diabetes
with
Thiazide diuretics


to mention the drugs come as whole String inside the method
Thiazide diuretics
.

How to do that ?

Answer

You need to deal with the regex directly in this case, .split() is not fit* for your purpose.

String s = "The patient is currently being treated for Diabetes with Thiazide diuretics";

Matcher m = Pattern.compile("\\b(?:Thiazide diuretics)\\b|\\S+").matcher(s);
ArrayList<String> result = new ArrayList<>();
while (m.find()) {
    result.add(m.group());
}
System.out.println(result);
// [The, patient, is, currently, being, treated, for, Diabetes, with, Thiazide diuretics]

Note: Technically it is possible to do so with .split() using lookarounds:

String s = "Thiazide not-a-keyword diuretics and Thiazide diuretics keyword";

String[] result = s.split("(?<!Thiazide) | (?!diuretics)");
System.out.println(Arrays.toString(result));
// [Thiazide, not-a-keyword, diuretics, and, Thiazide diuretics, keyword]

But this doesn't scale when you have got more keywords. Try to avoid this.