iOS objective-C NSData to NSString return nil,如何忽略无效的UTF-8

iOS objective-C NSData to NSString return nil, how to ignore the invalid UTF-8

data是从网站下载的,

NSString * html = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

htmlnil,但是

NSString * html = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];

会有内容。 由于该网站包含中文字符,如果使用Ascii,则无法显示中文。我猜网站中有一些无效的UTF-8,所以使第一个代码无法正常工作。

有什么方法可以继续使用UTF-8但忽略一些无效错误吗?

我想我找到了解决办法。

Vincent Guerci's answer

将 libiconv 添加到您的项目并让它清理无效的 UTF-8,清理后,NSData 可以安全地传递给 [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

具体实现是:

  1. 将 "Link Binary With Libraries" 中的 "libiconv.2.dylib" 添加到您的目标。
  2. #include "iconv.h"
  3. 添加此功能:

Objective C:

- (NSData *)cleanUTF8:(NSData *)data {
    // this function is from
    // 
    //
    //
    iconv_t cd = iconv_open("UTF-8", "UTF-8"); // convert to UTF-8 from UTF-8
    int one = 1;
    iconvctl(cd, ICONV_SET_DISCARD_ILSEQ, &one); // discard invalid characters
    size_t inbytesleft, outbytesleft;
    inbytesleft = outbytesleft = data.length;
    char *inbuf  = (char *)data.bytes;
    char *outbuf = malloc(sizeof(char) * data.length);
    char *outptr = outbuf;
    if (iconv(cd, &inbuf, &inbytesleft, &outptr, &outbytesleft)
        == (size_t)-1) {
        NSLog(@"this should not happen, seriously");
        return nil;
    }
    NSData *result = [NSData dataWithBytes:outbuf length:data.length - outbytesleft];
    iconv_close(cd);
    free(outbuf);
    return result;
}