如何在不对属性名称进行硬编码的情况下使用 vtd-xml 解析 xml?
How to parse xml using vtd-xml without hardcoding attr name?
这是我的示例 XML 文件,实际上超过 2gb。使用 vtd-xml 我已经取得了这么多成就:
当前代码:
https://gist.github.com/shadow-fox/21d1d4f30cbed0909f403c3ac0e1fa4d
public void reader() throws IOException, ParseException, NavException, XPathParseExceptionHuge, NavExceptionHuge,
XPathEvalExceptionHuge {
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectElementNS("*", "*");
int i = 0;
while (aph.iterate()) {
int t = vnh.getText();
if (t != -1) {
System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
i++;
}
}
}
}
当前结果:
PVAL|||298374234
PVAL|||1231
PVAL|||brown
PVAL|||medium
PVAL|||7
PVAL|||solid
PVAL|||brown
我想要的:
Sku_ID|||298374234
LotNum|||1231
COLOR|||brown
WIDTH|||medium
SIZE|||7
Pattern|||solid
Color Family|||brown
样本xml:
<?xml version="1.0" encoding="UTF-8" ?>
<RECORDS>
<RECORD>
<PROP NAME="Sku_ID">
<PVAL>298374234</PVAL>
</PROP>
<PROP NAME="LotNum">
<PVAL>1231</PVAL>
</PROP>
<PROP NAME="COLOR">
<PVAL>brown</PVAL>
</PROP>
<PROP NAME="WIDTH">
<PVAL>medium</PVAL>
</PROP>
<PROP NAME="SIZE">
<PVAL>7</PVAL>
</PROP>
<PROP NAME="Pattern">
<PVAL>solid</PVAL>
</PROP>
<PROP NAME="Color Family">
<PVAL>brown</PVAL>
</PROP>
</RECORD>
</RECORDS>
而且我不想对 attr
名称进行硬编码。我想在访问它们时取回它们。我该怎么做?
下面是我对您的代码进行的编辑,用于打印出 attr 名称和值...它是基于 xpath 的...
public static void main(String s[]) throws Exception{
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("d:\xml\sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
AutoPilotHuge aph2 = new AutoPilotHuge(vnh);
aph.selectElementNS("*", "*");
aph2.selectXPath("@*");
int i = 0;
while (aph.iterate()) {
System.out.println(vnh.toString(vnh.getCurrentIndex()));
int t = vnh.getText();
if (t != -1) {
System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
i++;
}
// below is the my addition
// it basically evaluates the attribute axis
// push pop ensure that the node iteration of the outer while loop
// is consistent
// resetXPath is key here, without it, xpath will not work except for the
// first node returned by aph.iterate()
vnh.push();
while((i=aph2.evalXPath())!=-1){
System.out.println(" attr name "+vnh.toString(i));
System.out.println("attr val "+vnh.toString(i+1));
}
aph2.resetXPath();
vnh.pop();
}
}
这是我的示例 XML 文件,实际上超过 2gb。使用 vtd-xml 我已经取得了这么多成就:
当前代码:
https://gist.github.com/shadow-fox/21d1d4f30cbed0909f403c3ac0e1fa4d
public void reader() throws IOException, ParseException, NavException, XPathParseExceptionHuge, NavExceptionHuge,
XPathEvalExceptionHuge {
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectElementNS("*", "*");
int i = 0;
while (aph.iterate()) {
int t = vnh.getText();
if (t != -1) {
System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
i++;
}
}
}
}
当前结果:
PVAL|||298374234
PVAL|||1231
PVAL|||brown
PVAL|||medium
PVAL|||7
PVAL|||solid
PVAL|||brown
我想要的:
Sku_ID|||298374234
LotNum|||1231
COLOR|||brown
WIDTH|||medium
SIZE|||7
Pattern|||solid
Color Family|||brown
样本xml:
<?xml version="1.0" encoding="UTF-8" ?>
<RECORDS>
<RECORD>
<PROP NAME="Sku_ID">
<PVAL>298374234</PVAL>
</PROP>
<PROP NAME="LotNum">
<PVAL>1231</PVAL>
</PROP>
<PROP NAME="COLOR">
<PVAL>brown</PVAL>
</PROP>
<PROP NAME="WIDTH">
<PVAL>medium</PVAL>
</PROP>
<PROP NAME="SIZE">
<PVAL>7</PVAL>
</PROP>
<PROP NAME="Pattern">
<PVAL>solid</PVAL>
</PROP>
<PROP NAME="Color Family">
<PVAL>brown</PVAL>
</PROP>
</RECORD>
</RECORDS>
而且我不想对 attr
名称进行硬编码。我想在访问它们时取回它们。我该怎么做?
下面是我对您的代码进行的编辑,用于打印出 attr 名称和值...它是基于 xpath 的...
public static void main(String s[]) throws Exception{
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("d:\xml\sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
AutoPilotHuge aph2 = new AutoPilotHuge(vnh);
aph.selectElementNS("*", "*");
aph2.selectXPath("@*");
int i = 0;
while (aph.iterate()) {
System.out.println(vnh.toString(vnh.getCurrentIndex()));
int t = vnh.getText();
if (t != -1) {
System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
i++;
}
// below is the my addition
// it basically evaluates the attribute axis
// push pop ensure that the node iteration of the outer while loop
// is consistent
// resetXPath is key here, without it, xpath will not work except for the
// first node returned by aph.iterate()
vnh.push();
while((i=aph2.evalXPath())!=-1){
System.out.println(" attr name "+vnh.toString(i));
System.out.println("attr val "+vnh.toString(i+1));
}
aph2.resetXPath();
vnh.pop();
}
}