iconv 中的输出缓冲区为空,同时从 ISO-8859-1 转换为 UTF-8
Output buffer empty in iconv , while converting from ISO-8859-1 to UTF-8
在 linux 中,我创建了一个包含土耳其语字符的文件并将文件字符集更改为“ISO-8859-9”。使用以下 cpp,我试图将其转换为 UTF-8。但是 iconv returns 清空输出缓冲区。但是 "iconv" returns "inbytesleft" as "0" 表示在输入时完成转换。这里可能是什么错误?
我的linux文件格式:
[root@osst212 cod]#文件test.txt
test.txt:ISO-8859 文本
[root@osst212 cod]# cat test.txt --> 这里我的 putty Characterset 设置是 ISO-8859-9
fıstıkçışahap
#include <string>
#include <iostream>
#include <locale>
#include <cstdlib>
#include <fstream>
#include <string>
#include <sstream>
#include <iconv.h>
#include <cstring>
#include <cerrno>
#include <csignal>
using namespace std;
int main()
{
const char* lna = getenv("LANG");
cout << "LANG is " << lna << endl;
setlocale(LC_ALL, "tr_TR.ISO8859-9");
ifstream fsl("test.txt",ios::in);
string myString;
if ( fsl.is_open() ) {
getline(fsl,myString); }
size_t ret;
size_t inby = sizeof(myString); /*inbytesleft for iconv */
size_t outby = 2 * inby; /*outbytesleft for iconv*/
char* input = new char [myString.length()+1]; /* input buffer to be translated to UTF-8 */
strcpy(input,myString.c_str());
char* output = (char*) calloc(outby,sizeof(char)); /* output buffer */
iconv_t iconvcr = iconv_open("UTF-8", "ISO−8859-9");
if ((ret = iconv(iconvcr,&input,&inby,&output,&outby)) == (size_t) -1) {
fprintf(stderr,"Could not convert to UTF-8 and error detail is \n",strerror(errno)); }
cout << output << endl;
raise(SIGINT);
iconv_close(iconvcr);
}
调用iconv后的局部变量如下,我运行在gdb下调用。可以看到输出是空的。
(gdb) bt
#0 0x00007ffff7224387 in raise () from /lib64/libc.so.6
#1 0x0000000000401155 in main () at stack.cpp:41
(gdb) frame 1
#1 0x0000000000401155 in main () at stack.cpp:41
41 raise(SIGINT);
(gdb) info locals
lna = 0x7fffffffef72 "en_US.UTF-8"
fsl = <incomplete type>
ret = 0
inby = 0
outby = 4
myString = "f5st5k75 6ahap"
input = 0x606268 " 6ahap"
output = 0x60628c ""
iconvcr = 0x606a00
man 3 iconv
The iconv()
function converts one multibyte character at a time, and for each character conversion it increments *inbuf
and decrements *inbytesleft
by the number of converted input bytes, it increments *outbuf
and decrements *outbytesleft
by the number of converted output bytes.
output
更新为指向最初分配的缓冲区中下一个未使用的字节。
正确的用法
char* nextouput = output:
if ((ret = iconv(iconvcr, &input, &inby, &nextoutput, &outby)) == (size_t) -1) {
fprintf(stderr, "Could not convert to UTF-8 and error detail is \n", strerror(errno)); }
在 linux 中,我创建了一个包含土耳其语字符的文件并将文件字符集更改为“ISO-8859-9”。使用以下 cpp,我试图将其转换为 UTF-8。但是 iconv returns 清空输出缓冲区。但是 "iconv" returns "inbytesleft" as "0" 表示在输入时完成转换。这里可能是什么错误?
我的linux文件格式: [root@osst212 cod]#文件test.txt test.txt:ISO-8859 文本
[root@osst212 cod]# cat test.txt --> 这里我的 putty Characterset 设置是 ISO-8859-9 fıstıkçışahap
#include <string>
#include <iostream>
#include <locale>
#include <cstdlib>
#include <fstream>
#include <string>
#include <sstream>
#include <iconv.h>
#include <cstring>
#include <cerrno>
#include <csignal>
using namespace std;
int main()
{
const char* lna = getenv("LANG");
cout << "LANG is " << lna << endl;
setlocale(LC_ALL, "tr_TR.ISO8859-9");
ifstream fsl("test.txt",ios::in);
string myString;
if ( fsl.is_open() ) {
getline(fsl,myString); }
size_t ret;
size_t inby = sizeof(myString); /*inbytesleft for iconv */
size_t outby = 2 * inby; /*outbytesleft for iconv*/
char* input = new char [myString.length()+1]; /* input buffer to be translated to UTF-8 */
strcpy(input,myString.c_str());
char* output = (char*) calloc(outby,sizeof(char)); /* output buffer */
iconv_t iconvcr = iconv_open("UTF-8", "ISO−8859-9");
if ((ret = iconv(iconvcr,&input,&inby,&output,&outby)) == (size_t) -1) {
fprintf(stderr,"Could not convert to UTF-8 and error detail is \n",strerror(errno)); }
cout << output << endl;
raise(SIGINT);
iconv_close(iconvcr);
}
调用iconv后的局部变量如下,我运行在gdb下调用。可以看到输出是空的。
(gdb) bt
#0 0x00007ffff7224387 in raise () from /lib64/libc.so.6
#1 0x0000000000401155 in main () at stack.cpp:41
(gdb) frame 1
#1 0x0000000000401155 in main () at stack.cpp:41
41 raise(SIGINT);
(gdb) info locals
lna = 0x7fffffffef72 "en_US.UTF-8"
fsl = <incomplete type>
ret = 0
inby = 0
outby = 4
myString = "f5st5k75 6ahap"
input = 0x606268 " 6ahap"
output = 0x60628c ""
iconvcr = 0x606a00
man 3 iconv
The
iconv()
function converts one multibyte character at a time, and for each character conversion it increments*inbuf
and decrements*inbytesleft
by the number of converted input bytes, it increments*outbuf
and decrements*outbytesleft
by the number of converted output bytes.
output
更新为指向最初分配的缓冲区中下一个未使用的字节。
正确的用法
char* nextouput = output:
if ((ret = iconv(iconvcr, &input, &inby, &nextoutput, &outby)) == (size_t) -1) {
fprintf(stderr, "Could not convert to UTF-8 and error detail is \n", strerror(errno)); }