I've got data in SQL Server 2005 that contains html tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like < with <, etc.
Is there an easy way to do this or has someone already got some sample sql code?
I don't have the ability to add extended stored procs and the like, so would prefer a pure sql approach (preferably one backwards compatible with sql 2000). I want to retrieve the data with stripped out html, not update it, so ideally it would be written as a function to make for easy reuse.
So for example converting this:
<B>Some useful text</B>
<A onclick="return openInfo(this)" href="http://there.com/3ce984e88d0531bac5349" target=globalhelp>
<IMG title="Source Description" height=15 alt="Source Description" src="/ri/new_info.gif" width=15 align=top border=0>
</A>> <b>more text</b></TD></TR>
Some useful text > more text
There is a UDF that will do that described here:
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0 BEGIN SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'') SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END GO
Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.