Nana Nana - 1 year ago 117
Java Question

Extract Substrings using regex Java

I have a string that has several NP( ), In between "NP(' and ')' is the data I want.
But i want just NP data inside not the first NP outside

How can I write a regex to extract "(DT a) (NN sign)" , "(DT the) (NN facade)" from the following text? I wnt for each text that contain NP to extract just inside NP data..I hope I explained well the problem

(ROOT (NP (NP (DT a) (NN sign)) (PP (IN on) (NP (NP (DT the) (NN facade)) (PP (IN of) (NP (DT the) (NN building)))))))

Answer Source

This regex will match all the data that are you asking:


Where \(DT\s\w+ match the Determiner, thr white space and the string, .{3} match ) ( and NN\s\w+\) match the Noun, singular or mass.
Using regexpal match the data but if you want use it in Java code you need to escape the charactes so it will look like this:

Pattern p = Pattern.compile("\\(DT\\s\\w+.{3}NN\\s\\w+\\)");
