JSON 数据有 "bad" 个字符导致 NSJSON 序列化终止

JSON data has "bad" characters that causes NSJSONSerialization to die

我正在使用 TVH 客户端的 ATV 版本 - 如果您还没有看过这个,那么值得一看 TVH 以瞥见脸上的疯狂。它有一个 JSON API 发送回数据,包括电子节目指南。有时,频道会在其数据中加入重音字符。这是一个例子,这是 Postman 的结果,注意 ?描述中的字符:

{
      "eventId": 14277,
      "episodeId": 14278,
      "channelName": "49.3 CometTV",
      "channelUuid": "02fe96403d58d53d71fde60649bf2b9a",
      "channelNumber": "49.3",
      "start": 1480266000,
      "stop": 1480273200,
      "title": "The Brain That Wouldn't Die",
      "description": "Dr. Bill Cortner and his fianc�e, Jan Compton , are driving to his lab when they get into a horrible car accident. Compton is decapitated. But Cortner is not fazed by this seemingly insurmountable hurdle. His expertise is in transplants, and he is excited to perform the first head transplant. Keeping Compton's head alive in his lab, Cortner plans the groundbreaking yet unorthodox surgery. First, however, he needs a body."
    },

如果将此数据输入 NSJSONSerialization,则会 returns 出错。所以为了避免这种情况,首先将数据输入这个函数:

+ (NSDictionary*)convertFromJsonToObjectFixUtf8:(NSData*)responseData error:(__autoreleasing NSError**)error {
    NSMutableData *FileData = [NSMutableData dataWithLength:[responseData length]];
    for (int i = 0; i < [responseData length]; ++i) {
        char *a = &((char*)[responseData bytes])[i];
        if ( (int)*a >0 && (int)*a < 0x20 ) {
            ((char*)[FileData mutableBytes])[i] = 0x20;
        } else {
            ((char*)[FileData mutableBytes])[i] = ((char*)[responseData bytes])[i];
        }
    }
    NSDictionary* json = [NSJSONSerialization JSONObjectWithData:FileData //1
                                                         options:kNilOptions
                                                           error:error];
    if( *error ) {
        NSLog(@"[JSON Error (2nd)] output - %@", [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding]);
        NSDictionary *userInfo = @{ NSLocalizedDescriptionKey:[NSString stringWithFormat:NSLocalizedString(@"Tvheadend returned malformed JSON - check your Tvheadend's Character Set for each mux and choose the correct one!", nil)] };
        *error = [[NSError alloc] initWithDomain:@"Not ready" code:NSURLErrorBadServerResponse userInfo:userInfo];
        return nil;
    }
    return json;
}

当数据中有一个控制字符,但没有像上面的情况那样的重音时,这会清除这种情况。当我输入该数据时,出现 "Tvheadend returned malformed JSON" 错误。

一个问题是用户可以在有限的选择中改变字符集,而服务器并没有告诉客户端它是什么。所以一个通道可能使用 UTF8 和另一个 ISO-8891-1,并且没有办法知道在客户端使用哪个。

所以:谁能就如何处理这些数据提出建议,以便我们将干净的字符串输入 NSJSONSerialization

我仍然不知道我所看到的问题的根本原因 - 服务器不仅发送我上面提到的高位字符,而且我还发现它也包含控制字符!查看其他线程似乎我不是唯一一个看到这个问题的人,所以希望其他人会发现这个有用...

基本技巧是使用 UTF8 将来自服务器的原始数据转换为字符串。如果其中有任何 "bad" 个字符,转换将失败。所以你检查结果字符串是否为空,然后尝试另一个字符集。最终你会取回数据。现在你取出那个字符串并去掉所有控制字符。现在您获取该结果,现在是 UTF8 "clean",并将其转换回 UTF8 NSData。这将通过 JSON 转换而不会出错。呸!

这是我最终使用的解决方案:

// ... the original data from the URL is in responseData
NSString *str = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];
if ( str == nil ) {
    str = [[NSString alloc] initWithData:responseData encoding:NSISOLatin1StringEncoding];
}
if ( str == nil ) {
    str = [[NSString alloc] initWithData:responseData encoding:NSASCIIStringEncoding];
}
NSCharacterSet *controls = [NSCharacterSet controlCharacterSet];
NSString *stripped = [[str componentsSeparatedByCharactersInSet:controls] componentsJoinedByString:@""];
NSData *data = [stripped dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary* json = [NSJSONSerialization JSONObjectWithData:data options:kNilOptions error:&error];

我希望有人觉得这有用!