libXML2 无法正确读取他自己的 XML UTF-8 格式
libXML2 cannot read properly his own XML UTF-8 format
我想用 libXML2 解析 UTF8 格式 XML。
我的代码是用 C 编写的,我使用 libXML2.
的 v2.9.3
我的代码如下:
xmlTextReaderPtr reader;
xmlTextWriterPtr writer;
writer = xmlNewTextWriterFilename("test.xml", 0);
xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
xmlTextWriterStartElement(writer, BAD_CAST "node_with_é_character");
xmlTextWriterEndElement(writer);
xmlTextWriterEndDocument(writer);
xmlFreeTextWriter(writer);
reader = xmlReaderForFile("test.xml", "UTF-8", XML_PARSE_RECOVER);
int ret = 1;
while (ret == 1) {
const xmlChar *nameT = xmlTextReaderConstName(reader);
printf("\n ---> %s\n",nameT);
ret = xmlTextReaderRead(reader);
}
输出是:
---> (null)
---> node_with_é_character
问题是 "node_with_é_character" 跟踪而不是 "node_with_é_character"
我的命令提示符是 "chcp 1252" set.
我不明白为什么 liXML2 不能 store/read "é" 字符。
正如您在 Windows 下的评论中所述,所以我猜您的源代码很可能不是 UTF-8 编码的,因此 C 字符串 "node_with_é_character" 在您的可执行文件中不是 UTF-8 编码的.
我不知道 libxml2 接口,但代码示例很清楚它需要 UTF-8 格式的输入参数。参见 http://xmlsoft.org/examples/testWriter.c
/* Write a comment as child of EXAMPLE.
* Please observe, that the input to the xmlTextWriter functions
* HAS to be in UTF-8, even if the output XML is encoded
* in iso-8859-1 */
tmp = ConvertInput("This is a comment with special chars: <\xE4\xF6\xFC>",
MY_ENCODING);
将您的源文件保存为 UTF-8 将帮助您解决问题。
我想用 libXML2 解析 UTF8 格式 XML。 我的代码是用 C 编写的,我使用 libXML2.
的 v2.9.3我的代码如下:
xmlTextReaderPtr reader;
xmlTextWriterPtr writer;
writer = xmlNewTextWriterFilename("test.xml", 0);
xmlTextWriterStartDocument(writer, NULL, "UTF-8", NULL);
xmlTextWriterStartElement(writer, BAD_CAST "node_with_é_character");
xmlTextWriterEndElement(writer);
xmlTextWriterEndDocument(writer);
xmlFreeTextWriter(writer);
reader = xmlReaderForFile("test.xml", "UTF-8", XML_PARSE_RECOVER);
int ret = 1;
while (ret == 1) {
const xmlChar *nameT = xmlTextReaderConstName(reader);
printf("\n ---> %s\n",nameT);
ret = xmlTextReaderRead(reader);
}
输出是:
---> (null)
---> node_with_é_character
问题是 "node_with_é_character" 跟踪而不是 "node_with_é_character"
我的命令提示符是 "chcp 1252" set.
我不明白为什么 liXML2 不能 store/read "é" 字符。
正如您在 Windows 下的评论中所述,所以我猜您的源代码很可能不是 UTF-8 编码的,因此 C 字符串 "node_with_é_character" 在您的可执行文件中不是 UTF-8 编码的.
我不知道 libxml2 接口,但代码示例很清楚它需要 UTF-8 格式的输入参数。参见 http://xmlsoft.org/examples/testWriter.c
/* Write a comment as child of EXAMPLE.
* Please observe, that the input to the xmlTextWriter functions
* HAS to be in UTF-8, even if the output XML is encoded
* in iso-8859-1 */
tmp = ConvertInput("This is a comment with special chars: <\xE4\xF6\xFC>",
MY_ENCODING);
将您的源文件保存为 UTF-8 将帮助您解决问题。