在 C++ 中使用 vtd-xml 时如何摆脱 EOFException?

How can I get rid of the EOFException when using vtd-xml in c++?

我正在编写一个程序来处理旧的 dataset in c++. I've already managed to convert the files from sgml to xml using the sx tool from James Clark. Since I have past experience using vtd-xml with Matlab (which is java based), and since vtd-xml has a c++ port, I decided to use that for my project. I am using vtd-xml version 2.12 since that was the newest version of the c++ port I could find. I managed to compile it using Visual Studio 2019 by changing all calls of wcsdup to _wcsdup and by using the _CRT_SECURE_NO_WARNINGS preprocessor definition. My program below appears to give correct output, but it also throws an exception during parsing of the xml file (a test xml file is also below). The exception is an EOFException。我没有发现我的 xml 文件有任何明显的错误,下面的测试 xml 重现了错误,这不是我从 sgml 转换而来的。我的直觉是,如果 c++ 端口中存在错误,则在谷歌搜索 vtd-xml EOFException 时会更容易找到有关它的信息。所以,在我看来,我为让它编译所做的更改可能是罪魁祸首,但我无法弄清楚如何摆脱异常。欢迎任何想法。如果涉及到它,如果它是免费的,我愿意为我的程序使用不同的xml库。

我的代码:

#include <iostream>
#include <fstream>
#include "VTDGen.h"
#include "autoPilot.h"
#include "customTypes.h"

using namespace std;
using namespace com_ximpleware;

int main() {
    ifstream xml(".\cd_catalog_short.xml", ios::binary | ios::ate);
    ifstream::pos_type pos = xml.tellg();
    long int length = static_cast<long int>(pos);
    char* pChars = new char[length];
    xml.seekg(0, ios::beg);
    xml.read(pChars, pos);
    xml.close();

    UCSChar node_path[] = L"/CATALOG/CD/TITLE";
    UCSChar* title;
    VTDGen vg;
    vg.setDoc(pChars, length);
    vg.parse(false);
    AutoPilot ap;
    ap.selectXPath(node_path);
    VTDNav* vn = vg.getNav();
    ap.bind(vn);
    while (ap.evalXPath() != -1) {
        int ind = vn->getText();
        if (ind != -1) {
            title = vn->toNormalizedString(ind);
            wcout << title << endl;
            delete[] title;
        }
    }
    return 0;
}

测试xml 文件:

<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>For the good times</TITLE>
    <ARTIST>Kenny Rogers</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Mucik Master</COMPANY>
    <PRICE>8.70</PRICE>
    <YEAR>1995</YEAR>
  </CD>
  <CD>
    <TITLE>Big Willie style</TITLE>
    <ARTIST>Will Smith</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1997</YEAR>
  </CD>
  <CD>
    <TITLE>Tupelo Honey</TITLE>
    <ARTIST>Van Morrison</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Polydor</COMPANY>
    <PRICE>8.20</PRICE>
    <YEAR>1971</YEAR>
  </CD>
</CATALOG>

我的程序输出:

Exception thrown at 0x00007FF96A36A839 in em.exe: Microsoft C++ exception: com_ximpleware::EOFException at memory location 0x0000005498B6F350.

For the good times

Big Willie style

Tupelo Honey

C:\Users\Joe\source\repos\em\x64\Release\em.exe (process 16308) exited with code 0.

To automatically close the console when debugging stops, enable Tools->Options->Debugging-> Automatically close the console when debugging stops.

Press any key to close this window . . .

vtd-xml 似乎使用 EOFException 更像是一个信号,而不是真正的错误状态。我消除了错误来自于通过 运行 程序的 java 版本在 Visual Studio (C++) 中进行编译而进行的更改的可能性。这使用最新的 java 版本的 vtd-xml (2.13-4-java),它仍然捕获 EOFException。如果我是 运行 通过控制台而不是 Visual Studio IDE 的 c++ 程序,我可能永远不会知道异常。

这里是 java 代码:

/* 
 * Copyright (C) 2002-2011 XimpleWare, info@ximpleware.com
 */
import com.ximpleware.*;
import com.ximpleware.xpath.*;
import java.io.*;

public class Tester {

  public static void main(String argv[]){


    VTDGen vg = new VTDGen();

        if (vg.parseFile("./cd_catalog_short.xml",false)){
        try {
            VTDNav vn = vg.getNav();
            AutoPilot ap = new AutoPilot(vn);
                    ap.selectXPath("/CATALOG/CD/TITLE");
                    int result = -1;
            int count = 0;
            while((result = ap.evalXPath())!=-1){
            System.out.print(""+result+"  ");     
            System.out.print("Element name ==> "+vn.toString(result));
            int t = vn.getText(); // get the index of the text (char data or CDATA)
            if (t!=-1)
              System.out.println(" Text  ==> "+vn.toNormalizedString(t));
            System.out.println("\n ============================== ");
            count++;
            }
            System.out.println("Total # of element "+count);
        }
            catch (NavException e){
             System.out.println(" Exception during navigation "+e);
            }
            catch (XPathParseException e){
             System.out.println(" Exception during parse "+e);
            }
            catch (XPathEvalException e){
             System.out.println(" Exception during xpath evaluation "+e);
            }
        }
  }
}

这里是 jdb 中的程序输出:

jdb -classpath .;ximpleware-2.13-4-java Tester

Initializing jdb ...

catch com.ximpleware.EOFException

Deferring all com.ximpleware.EOFException. It will be set after the class is loaded.

run

run Tester

Set uncaught java.lang.Throwable Set deferred uncaught java.lang.Throwable

VM Started: Set deferred all com.ximpleware.EOFException

Exception occurred: com.ximpleware.EOFException (to be caught at: com.ximpleware.VTDGen.parse(), line=2,663 bci=1,597)"thread=main", com.ximpleware.VTDGen$UTF8Reader.getChar(), line=774 bci=24 774 throw e;

main[1] cont

7 Element name ==> TITLE Text ==> For the good times

==============================

20 Element name ==> TITLE Text ==> Big Willie style

==============================

33 Element name ==> TITLE Text ==> Tupelo Honey

==============================

Total # of element 3

The application exited