user6798160 user6798160 - 2 months ago 9
SQL Question

SQL XML parsing split the string on letters 'TH'

I want to split a string into multiple values based on a special symbol. e.g., Here is the string


JdwnrhþTHIMPHUþOTHþþ10/1991þ02/02/2011þBHUTAN


I want it to be:


Jdwnrh THIMPHU OTH 10/1991 02/02/2011 BHUTAN


I am using the following SQL:

DECLARE @delimiter VARCHAR(50)
SET @delimiter='þ'
;WITH CTE AS
(
SELECT
CAST('<M>' + REPLACE(REPLACE(CAST(DATA as nvarchar(MAX)), @delimiter , '</M><M>'), '&', '&amp;') + '</M>' AS XML)
AS BDWCREGPREVADDR_XML
FROM [JACS_RAVEN_DATA_OLD].dbo.BDWCREGPREVADDR
)

SELECT
BDWCREGPREVADDR_XML.value('/M[1]', 'varchar(50)') As streetNo,
BDWCREGPREVADDR_XML.value('/M[2]', 'varchar(50)') As suburb,
BDWCREGPREVADDR_XML.value('/M[3]', 'varchar(3)') As stateCode,
BDWCREGPREVADDR_XML.value('/M[4]', 'varchar(10)') As postalCode,
BDWCREGPREVADDR_XML.value('/M[7]', 'varchar(50)') As country,
BDWCREGPREVADDR_XML.value('/M[5]', 'varchar(50)') As dateFrom,
BDWCREGPREVADDR_XML.value('/M[6]', 'varchar(50)') As dateTo
FROM CTE
GO


The query works well on all the strings other than the one provided as an example. For above the string, the query returns the following:


'Jdwnrh' ' ' 'IMPHU' 'O' ' ' '10/1991' '02/02/2011' 'BHUTAN'


It seems the code takes letters 'TH' as a new attribute and split the string on it. Does anyone know how to resolve this issue?

Answer

This seems to be related to your collation. In Latin1_General_CS_AS, the þ character is considered equivalent to th (because it's an Old English letter that sounds like "th" when pronounced).

print replace('thornþ' collate Latin1_General_CS_AS,'þ','1') 
' output: 1orn1

This is not the case for all collations; for example, in Latin1_General_BIN they are separate:

print replace('thornþ' collate Latin1_General_BIN,'þ','1') 
' output: thorn1

So perhaps you could look at changing the collation of the column which contains the þ characters.

Comments