Angie94 Angie94 - 1 month ago 6
Java Question

Stackoverflow when spliting string using regex

I'm doing a project in MapReduce using Amazon Web Services and I'm having this error:

FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.StackOverflowError at

I read a few other questions to understand why this happened and it seems my regex has repetitive alternative paths. This is the regex:


What it does is that it splits by space except when they are inside these symbols
< >
or these
" "
. So basically takes strings that are inside those 2 types of symbol. I have tried many other versions but none works, so I am far away from an optimal one. I am kind of lost and it's the first time Im using these complicated regexs. Can someone please give a better option for my regex?

I would truly appreciate every feedback regarding this!


This string with URLs inside <> and text inside "" and spaces:

<\> "HEY" <.org/1999/02/22-rdf-syntax-ns#type/>

should produce these 3 Strings:

1. <\> (with or without <>)

2. "HEY"

3. <.org/1999/02/22-rdf-syntax-ns#type/>


I think the symbols <> are confusing. I am trying to find a regex that splits by one or more spaces without taking into consideration the spaces inside " ", since the urls do not have spaces.

Answer Source

Try this:



    String string = "abc d<\\ &sioc_id=1/> \"HEY 1\" 2 3 <.org/1999/02/22-rdf-syntax-ns#type/> \"tra la\" <asdfadsf sadfasdf/> 4    \"sdf sdf\" 5 6";
    String[] res=string.split("\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)");

Will output:

[abc, d<\, &sioc_id=1/>, "HEY 1", 2, 3, <.org/1999/02/22-rdf-syntax-ns#type/>, "tra la", <asdfadsf, sadfasdf/>, 4, "sdf sdf", 5, 6]