glenstorey glenstorey - 3 months ago 18
Swift Question

NSString.rangeOfString returns unusual result with non-latin characters

I need to get the range of two words in a string, for example:

ยัฟิแก ไฟหก


(this is literally me typing PYABCD WASD) - it's a non-sensical test since I don't speak Thai.

//Find all the ranges of each word
var words: [String] = []
var ranges: [NSRange] = []

//Convert to nsstring first because otherwise you get stuck with Ranges and Strings.
let nstext = backgroundTextField.stringValue as NSString //contains "ยัฟิแก ไฟหก"
words = nstext.componentsSeparatedByString(" ")
var nstextLessWordsWeHaveRangesFor = nstext //if you have two identical words this prevents just getting the first word's range

for word in words
{

let range:NSRange = nstextLessWordsWeHaveRangesFor.rangeOfString(word)
Swift.print(range)
ranges.append(range)

//create a string the same length as word
var fillerString:String = ""

for i in 0..<word.characters.count{
//for var i=0;i<word.characters.count;i += 1{
Swift.print("i: \(i)")
fillerString = fillerString.stringByAppendingString(" ")
}

//remove duplicate words / letters so that we get correct range each time.
if range.length <= nstextLessWordsWeHaveRangesFor.length
{
nstextLessWordsWeHaveRangesFor = nstextLessWordsWeHaveRangesFor.stringByReplacingCharactersInRange(range, withString: fillerString)
}
}


outputs:

(0,6)
(5,4)


Those ranges are overlapping.

This causes problems down the road where I'm trying to use
NSLayoutManager.enumerateEnclosingRectsForGlyphRange
since the ranges are inconsistent.

How can I get the correct range (or in this specific case, non-overlapping ranges)?

Answer

Swift String characters describe "extended grapheme clusters", and NSString uses UTF-16 code points, therefore the length of a string differs depending on which representation you use.

The simplest solution is to stick to one, either String or NSString (if possible). Since you are working with NSString, changing

 for i in 0..<word.characters.count {

to

for i in 0..<range.length {

should solve the problem.