Maury Markowitz Maury Markowitz - 4 days ago 7
Objective-C Question

JSON data has "bad" characters that causes NSJSONSerialization to die

I am using the ATV version of TVH Client - if you haven't looked at this it's worth looking at TVH to glimpse madness in the face. It has a JSON API that sends back data, including the electronic program guide. Sometimes the channels put accented characters in their data. Here is an example, this is the result from Postman, note the ? char in the description:

{
"eventId": 14277,
"episodeId": 14278,
"channelName": "49.3 CometTV",
"channelUuid": "02fe96403d58d53d71fde60649bf2b9a",
"channelNumber": "49.3",
"start": 1480266000,
"stop": 1480273200,
"title": "The Brain That Wouldn't Die",
"description": "Dr. Bill Cortner and his fianc´┐Że, Jan Compton , are driving to his lab when they get into a horrible car accident. Compton is decapitated. But Cortner is not fazed by this seemingly insurmountable hurdle. His expertise is in transplants, and he is excited to perform the first head transplant. Keeping Compton's head alive in his lab, Cortner plans the groundbreaking yet unorthodox surgery. First, however, he needs a body."
},


If this data is fed into
NSJSONSerialization
, it returns an error. So to avoid this, the data is first fed into this function:

+ (NSDictionary*)convertFromJsonToObjectFixUtf8:(NSData*)responseData error:(__autoreleasing NSError**)error {
NSMutableData *FileData = [NSMutableData dataWithLength:[responseData length]];
for (int i = 0; i < [responseData length]; ++i) {
char *a = &((char*)[responseData bytes])[i];
if ( (int)*a >0 && (int)*a < 0x20 ) {
((char*)[FileData mutableBytes])[i] = 0x20;
} else {
((char*)[FileData mutableBytes])[i] = ((char*)[responseData bytes])[i];
}
}
NSDictionary* json = [NSJSONSerialization JSONObjectWithData:FileData //1
options:kNilOptions
error:error];
if( *error ) {
NSLog(@"[JSON Error (2nd)] output - %@", [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding]);
NSDictionary *userInfo = @{ NSLocalizedDescriptionKey:[NSString stringWithFormat:NSLocalizedString(@"Tvheadend returned malformed JSON - check your Tvheadend's Character Set for each mux and choose the correct one!", nil)] };
*error = [[NSError alloc] initWithDomain:@"Not ready" code:NSURLErrorBadServerResponse userInfo:userInfo];
return nil;
}
return json;
}


This cleans up the case when there is a control character in the data, but not an accent like the case above. When I feed in that data I get the "Tvheadend returned malformed JSON" error.

One problem is that the user can change the character set among a limited number of selections, and the server does not tell the client what it is. So one channel might use UTF8 and another ISO-8891-1, and there is no way to know which to use on the client side.

So: can anyone offer a suggestion on how to process this data so we feed clean strings into
NSJSONSerialization
?

Answer

I still do not know the root cause of the problem I am seeing - the server is sending not only high-bit characters like the ones I noted above, but I also found that it contained control characters too! Looking over other threads it appears I am not the only one seeing this problem, so hopefully others will find this useful...

The basic trick is to convert the original data from the server to a string using UTF8. If there are any of these "bad" chars in it, the conversion will fail. So you check if the resulting string is empty, and try another charset. Eventually you'll get data back. Now you take that string and strip out any control chars. Now you take that result, which is now UTF8 "clean", and convert it back to UTF8 NSData. That will pass through the JSON conversion without error. Phew!

Here is the solution I finally used:

// ... the original data from the URL is in responseData
NSString *str = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];
if ( str == nil ) {
    str = [[NSString alloc] initWithData:responseData encoding:NSISOLatin1StringEncoding];
}
if ( str == nil ) {
    str = [[NSString alloc] initWithData:responseData encoding:NSASCIIStringEncoding];
}
NSCharacterSet *controls = [NSCharacterSet controlCharacterSet];
NSString *stripped = [[str componentsSeparatedByCharactersInSet:controls] componentsJoinedByString:@""];
NSData *data = [stripped dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary* json = [NSJSONSerialization JSONObjectWithData:data options:kNilOptions error:&error];

I hope someone finds this useful!

Comments