XML解析VBAexcel(函数行,&MSXML2.DOMDocument)

XML parse VBA excel (function trip, & MSXML2.DOMDocument)

我需要解析数百个 XML 个结构完全相同的文件,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
  <Concepts>
    <ConceptModel name="food">
      <Filters>
        <Filter type="CC"/>
      </Filters>
      <Queries>
        <Query lang="EN">(cheese, bread, wine)</Query>
        <Query lang="DE">(Käse, Brot, Wein)</Query>
        <Query lang="FR">(fromaige, pain, vin)</Query>
      </Queries>
    </ConceptModel>
  </Concepts>

我在互联网上阅读了几篇文章和帖子,如下所示,但我无法想出解决方案:

到目前为止我在做:

Dim oXml As MSXML2.DOMDocument
Set oXml = New MSXML2.DOMDocument
oXml.LoadXML ("C:\folder\folder\name.xml")

Dim Queries As IXMLDOMNodeList
Dim Query As IXMLDOMNode

ThisWorkbook.Sheets(3).Cells(i, 1) = "before loop"

Set Queries = oXml.SelectNodes("/concepts/Queries")

MsgBox "how many Queries " &  Queries.Length

For Each Query In Queries
    ThisWorkbook.Sheets(3).Cells(i, 1) = "Works"
    ThisWorkbook.Sheets(3).Cells(i, 2) = Query.SelectNodes("Query").iTem(0).Text
    i = i + 1
Next

这段代码好像VBA看得懂,但是看不懂里面的内容。循环没有被读取,这意味着(我猜)查询根本没有循环。 Msgbox "how many queries" 给出 0 作为结果这一事实证实了这一点。但实际上有三个查询。有人可以帮帮我吗?

作为第二期我想问一下

 Dim oXml As MSXML2.DOMDocument

相同
 Dim oXml As MSXML2.DOMDocument60

自从我入住后 tools/references "Microsof XML, v6.0"

我认为查询带有标签 可能会导致问题。我添加了以下行:

Dim childs As IXMLDOMNodeList
Set childs = oXml.SelectNodes("/concepts")

MsgBox "childs " & childs.Length

结果也为 0。我希望有 3 个,因为概念有三个子项,即 ConceptModel、FilterQueries。于是,我更加疑惑了。

这是特殊字符(德语字母表),这意味着您需要对 XML 文件执行类似批量替换的操作,因此开头行不是这样的:

<?xml version="1.0" encoding="UTF-8"?>

但是这个:

<?xml version="1.0" encoding="iso-8859-1" ?>

之后要测试的代码:

Option Explicit
Public Sub test()
    Dim xmlDoc As Object
    Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
    With xmlDoc
        .validateOnParse = True
        .setProperty "SelectionLanguage", "XPath"
        .async = False
        If Not .Load("C:\Users\User\Desktop\Test.xml") Then
            Err.Raise .parseError.ErrorCode, , .parseError.reason
        End If
    End With
    Debug.Print xmlDoc.SelectNodes("//Query").Length
End Sub

这是我正在使用的XML:

<?xml version="1.0" encoding="iso-8859-1" ?>
  <Concepts>
      <ConceptModel name="food">
    <Filters>
      <Filter type="CC"/>
    </Filters>
    <Queries>
      <Query lang="EN">(cheese, bread, wine)</Query>
      <Query lang="DE">(Käse, Brot, Wein)</Query>
      <Query lang="FR">(fromaige, pain, vin)</Query>
   </Queries>
  </ConceptModel>
</Concepts>

尽可能接近你的OP

我提请您注意几个错误或误解:

  • [1] 无效 .LoadXML 语法

What is then the difference between .LoadXML ("C:\folder\folder\name.xml") and .Load ("C:\folder\folder\name.xml") ?

Load 需要一个 文件路径 然后将文件内容加载到 oXML 对象中。

LoadXML 不需要文件参数,但它的实际 XML 文本内容 必须是格式正确的字符串。

  • [2] XML区分小写和大写,因此节点需要通过它们的确切字面名称来寻址: <Query> 节点不会被 "query" 识别,"ConceptModel""conceptmodel".

As second issue I would like to ask if Dim oXml As MSXML2.DOMDocument would be the same as Dim oXml As MSXML2.DOMDocument60, since I checked in tools/references "Microsof XML, v6.0"?

不,不是。 - 请注意,以前的声明默认会加载 3.0 版。 然而,获得 6.0 版本绝对更可取(现在任何其他版本都已过时!)

由于您正在使用所谓的 early binding(引用 "Microsoft XML, v6.0"),我也会这样做,但我指的是当前版本 6.0:

Dim oXml As MSXML2.DOMDocument60        ' declare the xml doc object
Set oXml = New MSXML2.DOMDocument60     ' set an instance of it to memory
  • [3] 误解了一些 XPath 表达式

XPath 表达式中的起始斜杠“/” 始终指代 DocumentElement(此处为 <Concepts>), 您可以将 .DocumentElement 添加到文档对象中。起始双斜杠“//xyz”将找到任何 "xyz" 节点(如果存在)。

例如

    oXml.SelectNodes("//Query").Length 

return 与

相同的子节点数(此处:3)
    oXml.DocumentElement.SelectNodes("//Query").Length   ' or 
    oXml.SelectSingleNode("//Queries").ChildNodes.Length ' or even       
    oXml.SelectNodes("/*/*/*/Query").Length`.

代码示例参考XML版本6.0

当然,您必须遍历多个 xml 个文件,该示例仅使用一个文件(从第 2 行开始)。

只是针对格式不正确的xml文件的情况,我添加了一个详细的错误例程,使您能够识别假定的错误位置。 LoadLoadXML 都是 return 布尔值(如果加载正确则为 True,否则为 False)。

Sub xmlTest()

Dim ws   As Worksheet: Set ws = ThisWorkbook.Sheets(3)
Dim oXml As MSXML2.DOMDocument60
Set oXml = New MSXML2.DOMDocument60
With oXml
    .validateOnParse = True
    .setProperty "SelectionLanguage", "XPath"   ' necessary in version 3.0, possibly redundant here
    .async = False

    If Not .Load(ThisWorkbook.Path & "\xml\" & "name.xml") Then
        Dim xPE        As Object    ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
        Dim strErrText As String
        Set xPE = .parseError
        With xPE
           strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
           Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
          xPE.reason & _
          "Source Text: " & .srcText & vbCrLf & vbCrLf & _
          "Line No.:    " & .Line & vbCrLf & _
          "Line Pos.: " & .linepos & vbCrLf & _
          "File Pos.:  " & .filepos & vbCrLf & vbCrLf
        End With
        MsgBox strErrText, vbExclamation
        Set xPE = Nothing
        Exit Sub
    End If

    ' Debug.Print "|" & oXml.XML & "|"

    Dim Queries  As IXMLDOMNodeList, Query As IXMLDOMNode
    Dim Searched As String
    Dim i&, ii&
    i = 2       ' start row
  ' start XPath  
    Searched = "ConceptModel/Queries/Query"                     ' search string
    Set Queries = oXml.DocumentElement.SelectNodes(Searched)    ' XPath
  ' 
    ws.Cells(i, 1) = IIf(Queries.Length = 0, "No items", Queries.Length & " items")
    ii = 1
    For Each Query In Queries
        ii = ii + 1
        ws.Cells(i, ii) = Query.Text
    Next

End With

End Sub

其他提示

您可能还对如何通过 XMLDOM 列出 所有子节点 obtain attribute names from XML using VBA 的示例感兴趣。

由于稍后的评论(感谢@barrowc),我包含了进一步的提示

"A further issue with using MSXML, v3.0 is that the default selection language is XSLPatterns instead of XPath. Details on some of the differences between MSXML versions are here and the differences between the two selection languages are discussed here."

在当前的 MSXML2 版本 6.0 中完全支持 XPath 1.0。所以看来XSL Patterns早前微软已经实现了,基本上可以看作是XPath在W3C标准化之前的一个简化的XPath表达式子集。

MSXML2 版本 3.0 允许至少通过显式选择语言设置集成 XPath 1.0:

oXML.setProperty "SelectionLanguage", "XPath"   ' oXML being the DOMDocument object as used in original post