I have an app written in C# that does a lot of string comparison. The strings are pulled in from a variety of sources (including user input) and are then compared. However I'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the compare, there is no match.
What is the best way to go about this? Am I going to have to go to all parts of the code that do a string compare and manually normalize non-breaking spaces to spaces? Does .NET offer anything to help with that? (I've tried all the compare options but none seem to help.)
It has been suggested that I normalize the strings upon receipt and then let the string compare method simply compare the normalized strings. I'm not sure it would be straight-forward to do that because what is a normalized string in the first place. What do I normalize it too? Sure, for now I can convert non-breaking spaces to breaking spaces. But what else can show up? Can there potentially be very many of these rules? Might they even be conflicting. (In one case I want to use a rule and in another I don't.)
If it were me, I would 'normalize' the strings as I 'pulled them in'; probably with a string.Replace(). Then you won't need to change your comparisons anywhere else.
Edit: Mark, that's a tough one. Its really up to you, or you clients, as to what is a 'normalized' string. I've been in a similar situation where the customer demanded that strings like:
I have 4 apples. I have four apples.
were actually equal. You may need separate normalizers for different situations. Either way, I would still do the normalization upon retrieval of the original strings.