根据 .csv 文件更新 .xml 文档

Updating an .xml document according to a .csv file

我是 VBA 的新手,我正在努力寻找解决问题的方法。基本上,我需要做的是根据 .csv 文档的内容编辑 .xml 文件中的一些节点。

特别是,每当我遍历 XML 文档(即“C:\Users\xxx\Desktop\ppp.xml”)时,我偶然发现了一个特定的节点(假设它是 thing),我需要读取该节点的文本并在 CSV 文件中查找它(即 C:\Users\xxx\Desktop\mycopy.csv")。然后编辑另一个节点的文本(让它成为qt) 在同一个 XML 文件中。我在考虑以下理由:

  1. 由于XML文件编辑(下面我用的是微软XML,v3.0)需要根据.csv内容来完成
  2. 我首先将 CSV 转换为 Excel 工作簿 (.xlsx)(我不太了解管理 CSV 文件,所以这种方式对我来说更易于管理)。
  3. 然后在 VBA.
  4. 中执行某种 Vlookup 版本

这很好用,如果我运行单独将这部分代码显示如下。由于我知道VBA中的一些XML,所以我对如何编辑节点和属性有了基本的了解。但是,我很难将 link XML 文件添加到 Excel 工作簿。我看了很多VBA中的XML编辑示例,但编辑是根据相同的XML执行的,而不是在不同的文件中寻找值。我会 post 我的代码示例,它显然不起作用,希望它足够清楚。谢谢。

Option Explicit
    
Sub editxml()
    
    Dim Obj As DOMDocument  
    Dim xmlpath As String
    Dim loadcheck As Boolean
    Dim Node As IXMLDOMNodeList  
    Dim Nm As IXMLDOMNode 
    Dim thing As Object, q As Object
    
    Dim wb As Workbook         
    Dim ws As Worksheet
    Dim mycsvfile As String 
    Dim i As Integer, numcol As Integer
    Dim line As String
    Dim row As Integer 
    Dim matrix As Variant  
    
    Dim rngSearch As Range, rngLast As Range, rngFound As Range
    Dim strFirstAddress As String
    
    Set Obj = New DOMDocument
    Obj.async = False: Obj.validateOnParse = False
    
    xmlpath = "C:\Users\xxx\Desktop\ppp.xml"
    Obj.SetProperty "SelectionNamespaces", "xmlns:ns0='http://update.DocumentTypes.Schema.ppp.Xml'"
    
    loadcheck = Obj.Load(xmlpath)
    If loadcheck = True Then
        MsgBox "File XML uploaded"
    Else
        MsgBox "File XML not uploaded"
    End If
    
    Set Node = Obj.DocumentElement.SelectNodes("AA/BB/CC/DD")
    
    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        If thing.Text = rngFound Then
        q.Text = "do somewhat else"
        End If
    Next
        
    Obj.Save (xmlpath)
    
    Set wb = Workbooks.Add
    wb.SaveAs Filename:="csvtoxlsxfind" & ".xlsx"  
    Set ws = wb.Sheets(1)
    
    With ws
        row = 1
    
        mycsvfile = "C:\Users\xxx\Desktop\mycopy.csv"  
    
        Open mycsvfile For Input As #1
        
        Do Until EOF(1)
            Line Input #1, line    
            matrix = Split(line, ",") 
            
            numcol = UBound(matrix) - LBound(matrix) + 1    
       
            For i = 1 To numcol     
                Cells(row, i) = matrix(i - 1)      
            Next i
            row = row + 1
        
        Loop
        Close #1
        
        'set the search range, i.e where I have to find the value:
        Set rngSearch = .Range("AR:AR")
    
        'specify last cell in range:
        Set rngLast = rngSearch.Cells(rngSearch.Cells.Count)
    
        'Find the "thing" in search range, when it first occurrs (rngFound=1st occurrence).
        Set rngFound = rngSearch.find(What:=thing.Text, After:=rngLast, LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, SearchFormat:=False)
    
        'if the "thing" is found in search range:
        If Not rngFound Is Nothing Then
            'saves the address of the first occurrence of the "thing" in the strFirstAddress variable:
            strFirstAddress = rngFound.Address
        
            Do
            'Find next occurrence of the "thing". 
            
            MsgBox rngFound.Address & " " & rngFound.Offset(0, -29).Value * rngFound.Offset(0, -6)
            
            Set rngFound = rngSearch.FindNext(rngFound)
            rngFound.Font.Color = vbRed
            rngFound.Offset(0, -40).Font.Color = vbRed
            
            Loop Until rngFound.Address = strFirstAddress
            
        Else
            MsgBox "thing not found"
        End If
    End With
    
End Sub 

我很清楚代码中没有意义的部分如下:

    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        If thing.Text = rngFound Then
        q.Text = "do somewhat else"
        End If
    Next

因为我还没有定义 rngFound(这将是我的 Vlookup 搜索的结果)。

我遵循的逻辑是否有意义,或者代码需要从头开始重写?是否可以避免 CSV 文件的 Excel .xlsx 转换,从而直接在 CSV 中进行搜索?

更新(回答蒂姆·威廉姆斯的问题) 在代码的以下部分,我需要用 .csv 文件中两个单元格的乘积更新每个节点“事物”的文本,例如

 If thing.Text = rngFound Then
     q.Text = ws.Range("A:A").value*ws.Range("K:K").value
 End If

是否可以对集合对象中的元素应用偏移函数之类的东西?我知道偏移量只能应用于一个范围,所以我认为需要为此创建一个新函数,对吗?

未经测试,但我认为应该是正确的。由于“找到一个范围内的所有匹配单元格”是一项非常常见的任务,我喜欢为此使用一个独立的函数,而不是用该逻辑使主代码混乱。

Sub editxml()
    
    Dim Obj As MSXML2.DOMDocument60
    Dim xmlpath As String
    Dim Node As IXMLDOMNodeList
    Dim Nm As IXMLDOMNode
    Dim thing As Object, q As Object
    Dim wb As Workbook, ws As Worksheet
    Dim matches As Collection
    
    Set Obj = New DOMDocument60
    Obj.async = False
    Obj.validateOnParse = False
    
    xmlpath = "C:\Users\xxx\Desktop\ppp.xml"
    Obj.SetProperty "SelectionNamespaces", "xmlns:ns0='http://update.DocumentTypes.Schema.ppp.Xml'"
    
    If Obj.Load(xmlpath) = True Then
        MsgBox "File XML uploaded"
    Else
        MsgBox "File XML not uploaded"
        Exit Sub
    End If
    
    'open the CSV file
    Set wb = Workbooks.Open("C:\Users\xxx\Desktop\mycopy.csv")
    Set ws = wb.Worksheets(1)
    
    Set Node = Obj.DocumentElement.SelectNodes("AA/BB/CC/DD")
    
    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        'moved the Find logic to a standalone function
        Set matches = FindAll(ws.Range("AR:AR"), thing.Text)
        
        'did we get any matches in the range?
        If matches.Count > 0 Then
            'It's not clear what should go here - are you replacing
            ' with some other text from the CSV, or just a fixed value?
            q.Text = "do somewhat else"
            
            'you can apply formatting to the found cells here...
        End If
    Next
        
    Obj.Save xmlpath
    
End Sub

'find all matching cells in a range and return them in a Collection
Public Function FindAll(rng As Range, val As String) As Collection
    Dim rv As New Collection, f As Range, addr As String
    Set f = rng.Find(what:=val, after:=rng.Cells(rng.Cells.Count), _
        LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, _
        SearchDirection:=xlNext, MatchCase:=False)
    If Not f Is Nothing Then addr = f.Address() 'store first cell found
    Do Until f Is Nothing
        rv.Add f
        Set f = rng.FindNext(after:=f)
        If f.Address() = addr Then Exit Do 'exit if we've looped back to first cell
    Loop
    Set FindAll = rv
End Function

任何时候您需要编辑 XML,请考虑 XSLT,用于转换 XML 文件的专用编程语言(可以在 VBA 中分层) .专门针对您的情况,您可以循环访问 CSV 导入以将值传递到参数化 XSLT。

为了演示(直到 OP 包含示例数据),下面使用前三个 vba and xslt Whosebug 用户中的 XML:

数据

XML(通知空<total_rep>个节点待更新)

<?xml version="1.0"?>
<Whosebug>
  <group lang="vba">
    <topusers>
      <user>Siddharth Rout</user>
      <link>https://whosebug.com/users/1140579/siddharth-rout</link>
      <location>Mumbai, India</location>
      <total_rep></total_rep>
      <tag1>excel</tag1>
      <tag2>vba</tag2>
      <tag3>excel-formula</tag3>
    </topusers>
    <topusers>
      <user>Scott Craner</user>
      <link>https://whosebug.com/users/4851590/scott-craner</link>
      <location>Flyover Country</location>
      <total_rep></total_rep>
      <tag1>excel</tag1>
      <tag2>vba</tag2>
      <tag3>excel-formula</tag3>
    </topusers>
    <topusers>
      <user>Tim Williams</user>
      <link>https://whosebug.com/users/478884/tim-williams</link>
      <location>San Francisco, CA, United States</location>
      <total_rep></total_rep>
      <tag1>vba</tag1>
      <tag2>excel</tag2>
      <tag3>arrays</tag3>
    </topusers>
  </group>
  <group lang="xslt">
    <topusers>
      <user>Dimitre Novatchev</user>
      <link>https://whosebug.com/users/36305/dimitre-novatchev</link>
      <location>United States</location>
      <total_rep></total_rep>
      <tag1>xslt</tag1>
      <tag2>xml</tag2>
      <tag3>xpath</tag3>
    </topusers>
    <topusers>
      <user>Martin Honnen</user>
      <link>https://whosebug.com/users/252228/martin-honnen</link>
      <location>Germany</location>
      <total_rep></total_rep>
      <tag1>xslt</tag1>
      <tag2>xml</tag2>
      <tag3>xpath</tag3>
    </topusers>
    <topusers>
      <user>Michael Kay</user>
      <link>https://whosebug.com/users/415448/michael-kay</link>
      <location>Reading, United Kingdom</location>
      <total_rep></total_rep>
      <tag1>xml</tag1>
      <tag2>xslt</tag2>
      <tag3>xpath</tag3>
    </topusers>
  </group>
</Whosebug>

CSV

user total_rep
Siddharth Rout 134,062
Scott Craner 123,313
Tim Williams 116,760
Dimitre Novatchev 227,632
Martin Honnen 134,713
Michael Kay 135,177

XSLT

(下面另存为.xsl文件,特殊的.xml文件,在VBA中加载)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <!-- INITIALIZE PARAMS -->
  <xsl:param name="user_param"/>
  <xsl:param name="total_rep_param"/>
  
  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <!-- CONDITIONALLY UPDATE TEXT BY DIFFERENT NODE MATCH -->
  <xsl:template match="topusers">
    <xsl:copy>
        <xsl:apply-templates select="user|link|location"/>
        <total_rep>
            <xsl:choose>
                <xsl:when test="user = $user_param">
                    <xsl:value-of select="$total_rep_param"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="total_rep"/>
                </xsl:otherwise>
            </xsl:choose>
        </total_rep>
        <xsl:apply-templates select="tag1|tag2|tag3"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

VBA

(加载 CSV 和转换的两个子例程 XML,XSLT 中匹配上面的参数名称)

Sub LoadCSV()
    Dim csv_file As String
    
    csv_file = "C:\Path\To\File.csv"
    
    With ThisWorkbook.Worksheets("Sheet1")
        .Columns("A:D").EntireColumn.Delete
        
        With .QueryTables.Add(Connection:="TEXT;" & csv_file, _
                          Destination:=.Range("A1"))
               .TextFileParseType = xlDelimited
               .TextFileCommaDelimiter = True
               .Refresh
        End With
    
       .QueryTables(1).SaveData = False
       .QueryTables.Item(1).Delete
    End With
End Sub
Sub TransformXML()
    ' REFERENCE Microsoct XML, v6.0
    Dim xmlDoc As MSXML2.DOMDocument60
    Dim xslDoc As MSXML2.FreeThreadedDOMDocument60
    Dim xslTemp As MSXML2.XSLTemplate60
    Dim xslProc As Object
    
    Dim i As Long, lastrow As Long
    Dim param1 As Variant, param2 As Variant

    Call LoadCSV                                        ' LOAD CSV FILE
    
    Set xmlDoc = New MSXML2.DOMDocument60               ' LOAD XML FILE
    xmlDoc.Load "C:\Path\To\Input.xml"

    Set xslDoc = New MSXML2.DOMDocument60               ' LOAD XSL SCRIPT
    xslDoc.Load "C:\Path\To\Script.xsl"
    
    ' INITIALIZE NEEDED OBJECTS
    Set xslTemp = New MSXML2.XSLTemplate60
    Set xslTemp.stylesheet = xslDoc
    Set xslProc = xslTemp.createProcessor()
    
    With ThisWorkbook.Worksheets("Sheet1")
        lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row

        ' ITERATE THROUGH CSV DATA
        For i = 2 To lastrow
            param1 = .Range("A" & i).Value
            param2 = .Range("B" & i).Value
            
            With xslProc
               .input = xmlDoc
               .addParameter "user_param", param1      ' ADD PARAMETER(S)
               .addParameter "total_rep_param", param2

               .transform                              ' TRANSFORM XML
                xmlDoc.LoadXML .output                 ' LOAD RESULT TREE
            End With
        Next i
    End With

    xmlDoc.Save "C:\Path\To\Output.xml"                ' SAVE OUTPUT TO FILE
       
    Set xmlDoc = Nothing: Set xslDoc = Nothing
    Set xslTemp = Nothing: Set xslProc = Nothing
End Sub

输出

(注意 <total_rep> 从 CSV 填充)

<?xml version="1.0"?>
<Whosebug>
    <group lang="vba">
        <topusers>
            <user>Siddharth Rout</user>
            <link>https://whosebug.com/users/1140579/siddharth-rout</link>
            <location>Mumbai, India</location>
            <total_rep>134062</total_rep>
            <tag1>excel</tag1>
            <tag2>vba</tag2>
            <tag3>excel-formula</tag3>
        </topusers>
        <topusers>
            <user>Scott Craner</user>
            <link>https://whosebug.com/users/4851590/scott-craner</link>
            <location>Flyover Country</location>
            <total_rep>123313</total_rep>
            <tag1>excel</tag1>
            <tag2>vba</tag2>
            <tag3>excel-formula</tag3>
        </topusers>
        <topusers>
            <user>Tim Williams</user>
            <link>https://whosebug.com/users/478884/tim-williams</link>
            <location>San Francisco, CA, United States</location>
            <total_rep>116760</total_rep>
            <tag1>vba</tag1>
            <tag2>excel</tag2>
            <tag3>arrays</tag3>
        </topusers>
    </group>
    <group lang="xslt">
        <topusers>
            <user>Dimitre Novatchev</user>
            <link>https://whosebug.com/users/36305/dimitre-novatchev</link>
            <location>United States</location>
            <total_rep>227632</total_rep>
            <tag1>xslt</tag1>
            <tag2>xml</tag2>
            <tag3>xpath</tag3>
        </topusers>
        <topusers>
            <user>Martin Honnen</user>
            <link>https://whosebug.com/users/252228/martin-honnen</link>
            <location>Germany</location>
            <total_rep>134713</total_rep>
            <tag1>xslt</tag1>
            <tag2>xml</tag2>
            <tag3>xpath</tag3>
        </topusers>
        <topusers>
            <user>Michael Kay</user>
            <link>https://whosebug.com/users/415448/michael-kay</link>
            <location>Reading, United Kingdom</location>
            <total_rep>135177</total_rep>
            <tag1>xml</tag1>
            <tag2>xslt</tag2>
            <tag3>xpath</tag3>
        </topusers>
    </group>
</Whosebug>