iconv 库不正确地将 UTF-8 转换为 KOI8-R
iconv library improperly converts UTF-8 to KOI8-R
我正在尝试使用 GNU iconv
库将 UTF-8 编码字符串转换为 KOI8-R。我的最小示例是
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
/* The letter П in UTF-8. */
char* buffer = "\xd0\x9f";
size_t len = 2;
/* Note: since KOI8-R is an 8-bit encoding, the buffer should only need a length of 1, but
* iconv returns -1 if the buffer is any smaller than 4 bytes,
*/
size_t len_in_koi = 4;
char* buffer_in_koi = malloc(len_in_koi+1);
/* A throwaway copy to give to iconv. */
char* buffer_in_koi_copy = buffer_in_koi;
iconv_t cd = iconv_open("UTF-8", "KOI8-R");
if (cd == (iconv_t) -1) {
fputs("Error while initializing iconv_t handle.\n", stderr);
return 2;
}
if (iconv(cd, &buffer, &len, &buffer_in_koi_copy, &len_in_koi) != (size_t) -1) {
/* Expecting f0 but get d0. */
printf("Conversion successful! The byte is %x.\n", (unsigned char)(*buffer_in_koi));
} else {
fputs("Error while converting buffer to KOI8-R.\n", stderr);
return 3;
}
iconv_close(cd);
free(buffer_in_koi);
return 0;
}
(除了当我的 KOI8-R 缓冲区小于四个字节时不工作,尽管它应该只需要一个字节)错误地打印 d0
(KOI8 中 'П'
的正确编码-R 是 f0
).
iconv
从命令行给出了正确的答案(例如,echo П | iconv -t KOI8-R | hexdump
),那么我在使用它的 C 接口时做错了什么?
您将 "to" 和 "from" 字符集参数混淆为 iconv_open
。恰好 KOI8-R 中槽 D0
中的字符具有 D0
作为其 UTF-8 编码的第一个字节。
我正在尝试使用 GNU iconv
库将 UTF-8 编码字符串转换为 KOI8-R。我的最小示例是
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
/* The letter П in UTF-8. */
char* buffer = "\xd0\x9f";
size_t len = 2;
/* Note: since KOI8-R is an 8-bit encoding, the buffer should only need a length of 1, but
* iconv returns -1 if the buffer is any smaller than 4 bytes,
*/
size_t len_in_koi = 4;
char* buffer_in_koi = malloc(len_in_koi+1);
/* A throwaway copy to give to iconv. */
char* buffer_in_koi_copy = buffer_in_koi;
iconv_t cd = iconv_open("UTF-8", "KOI8-R");
if (cd == (iconv_t) -1) {
fputs("Error while initializing iconv_t handle.\n", stderr);
return 2;
}
if (iconv(cd, &buffer, &len, &buffer_in_koi_copy, &len_in_koi) != (size_t) -1) {
/* Expecting f0 but get d0. */
printf("Conversion successful! The byte is %x.\n", (unsigned char)(*buffer_in_koi));
} else {
fputs("Error while converting buffer to KOI8-R.\n", stderr);
return 3;
}
iconv_close(cd);
free(buffer_in_koi);
return 0;
}
(除了当我的 KOI8-R 缓冲区小于四个字节时不工作,尽管它应该只需要一个字节)错误地打印 d0
(KOI8 中 'П'
的正确编码-R 是 f0
).
iconv
从命令行给出了正确的答案(例如,echo П | iconv -t KOI8-R | hexdump
),那么我在使用它的 C 接口时做错了什么?
您将 "to" 和 "from" 字符集参数混淆为 iconv_open
。恰好 KOI8-R 中槽 D0
中的字符具有 D0
作为其 UTF-8 编码的第一个字节。