In my C# code, I am extracting text from a PDF document. When I do that, I get a string that's in UTF-8 or Unicode encoding (I'm not sure which). When I use
[67, 76, 69, 194 ,160, 65 ,99, 116, 105, 111, 110]
194 160 is the UTF-8 encoding of a
NO-BREAK SPACE codepoint (the same codepoint that HTML calls
So it's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for
\s would match it, but a plain comparison with a space won't.
To simply replace NO-BREAK spaces you can do the following:
src = src.Replace('\u00A0', ' ');