Alberto Andeliero Alberto Andeliero - 1 year ago 72
Scala Question

how to match html tag

I have to parse string like this:

foo <img ... > <strong>foo</strong> bar

and i need to replace img tag with an empty string

foo <strong>foo</strong> bar

I've tried with


but the result is

foo bar

How can i do?

PS: the html string is malformed

Answer Source

To match the tast of SO this answer will have three parts * Answer to your problem * Official rant * Cleaner soulution

Answer to the problem

* is greedy so it will match to much. Two solutions are possible:

1.) *? non greedy match all 2.) <[^>]+> all within brackets


Never parse HTML using regex. There are many subtele errors you can run into. There is also this post on this: RegEx match open tags except XHTML self-contained tags

Cleaner soultion

Parse using XML-Parser with TagSoup Here is an example that lets you treat HTML as XML like structure with Scala and tagsoup: