dotfreelancer dotfreelancer - 1 month ago 5
SQL Question

how to strip all html tags and special characters from string using sql server

I work as asp.net developer using C#, I receive text like this from the client:

> <p><a
> href="http://www.vogue.co.uk/person/kate-winslet">KATE
> WINSLET</a> has given birth to a 9lb baby boy. The
> Oscar-winning actress welcomed the baby with her husband Ned Rocknroll
> at a hospital in Sussex.</p>
>
> <p>"Kate had 'Baby Boy Winslet' on
> Saturday at an NHS Hospital," Winslet's spokeswoman
> said, adding that the family were "thrilled to
> bits".</p>
>
> <p>The announcement suggests that the child might bear his
> mother's surname, rather than his father's slightly
> more unusual moniker.</p>
>
> <p>The baby is Winslet's third - she is already mother
> to Mia, 13, and Joe, eight,  from previous relationships -
> and her husband's first. They met on Necker Island, owned by
> Rocknroll's uncle, Richard Branson, and<a
> href="http://www.vogue.co.uk/news/2013/kate-winslet-married-to-ned-rocknroller---wedding-details">married almost a year ago</a> in New York.</p>


I need a way to extract the real text without tags and special characters using sql server 2008 or above ??

gbn gbn
Answer

The best I can suggest is to use a .net HTML parser or such which is wrapped in a SQL CLR function. Or to wrap the regex in SQL CLR if you want.

Note regex limitations: http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html

Raw SQL language won't do it: it is not a string (or HTML) processing language