Java Question

Why doesn't my regexp work?

Following code prints nothing. What am i doing wrong?
Regexp tester myregexp says that regular expression is correct.

page = "<div id=\"foo\" class=\"foo\" style=\"background-image: url(foo.jpg); width: 320px; height: 245px\">\n" +
" <a href=\"foo\" onclick=\"return bar('foo', 'foo', {foo: bar, foo: bar}, foo)\"></a>\n" +

Pattern pattern = Pattern.compile("<div.*?</div>");
Matcher matcher = pattern.matcher(page);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.end());

Answer Source

By default, . in a regex does not match newlines. This means that your regex cannot match the </div> because the newline before it doesn't match ..

You should replace your compile line with:

Pattern pattern = Pattern.compile("<div.*?</div>",Pattern.DOTALL);

But as was noted in the comments, except in simple cases where you have control over the structure of the HTML (no comments, no Javascript, etc.), you should parse HTML with an HTML parser like JSoup, not using a regex.