dotfreelancer dotfreelancer - 1 year ago 70
SQL Question

how to strip all html tags and special characters from string using sql server

I work as developer using C#, I receive text like this from the client:

> <p><a
> href="">KATE
> WINSLET</a> has given birth to a 9lb baby boy. The
> Oscar-winning actress welcomed the baby with her husband Ned Rocknroll
> at a hospital in Sussex.</p>
> <p>"Kate had 'Baby Boy Winslet' on
> Saturday at an NHS Hospital," Winslet's spokeswoman
> said, adding that the family were "thrilled to
> bits".</p>
> <p>The announcement suggests that the child might bear his
> mother's surname, rather than his father's slightly
> more unusual moniker.</p>
> <p>The baby is Winslet's third - she is already mother
> to Mia, 13, and Joe, eight,  from previous relationships -
> and her husband's first. They met on Necker Island, owned by
> Rocknroll's uncle, Richard Branson, and<a
> href="">married almost a year ago</a> in New York.</p>

I need a way to extract the real text without tags and special characters using sql server 2008 or above ??

gbn gbn
Answer Source

The best I can suggest is to use a .net HTML parser or such which is wrapped in a SQL CLR function. Or to wrap the regex in SQL CLR if you want.

Note regex limitations:

Raw SQL language won't do it: it is not a string (or HTML) processing language

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download