Jean-Denis Muys Jean-Denis Muys - 10 months ago 63
Objective-C Question

Efficient character processing of NSString

I need to do some character processing of huge strings in Cocoa (from Objective-C or Swift), where:

  • The input string passed in an
    has n characters

  • The result should be returned in an

  • It's OK to make simplifying assumptions on the characters in the string. I mean we can assume they are all ASCII, or all single code unit UTF8, or even all
    (so as to make indexing and length computation O(1))

For the sake of the example, let's say the processing is a rot13 obfuscation.

I want to do it space and time efficiently:

  • I want to get a mutable buffer of characters from the source string (probably a copy)

  • I want to alter that buffer in place

  • I want to return the altered buffer in a constructed
    without doing another copy.

I want space complexity ≤ 2*n+ O(1).

I want time complexity O(n) - with as small a constant as possible.

API allows for that easily, but is too inefficient, with plenty of back and forth conversion from character to string. I am shooting for C-level efficient processing of characters here.

API also allows to get a buffer of character with methods such as
. But I can't find a way to use the API where I copy the characters for processing no more than once.

Answer Source

Allocate a buffer of unichar. Copy into the buffer w/ getCharacters(range:). Manipulate. Convert back using init(charactersNoCopy:length:freeWhenDone:).

unichar is UTF-16. If you're willing to assume that nothing require surrogate characters (for example, if you assume it's ASCII), then you can just allocate your buffer based on length (it'll be 2 * length). If you want to be more flexible, but still O(1) at the cost of 2-3x memory requirement, then use maximumLengthOfBytes. If you want to be more flexible, but are willing to accept an O(n) step (I assume you aren't), then use lengthOfBytes.

It is somewhat common for NSString to be stored internally as UTF-16, so this tends to be a very fast conversion. That said, if you have enough knowledge about your strings, and are willing to write additional code to manipulate encodings directly, then look at fastestEncoding.