XMLWriter 和递归

Question

我已经成功地实现了 XML 脚本的目录结构，如下所示。

function GenerateLibraryMap ($path) {
    function ProcessChildNode { 
        param ( 
            $parentNode, 
            $childPath
        ) 
        $dirInfo = [System.IO.DirectoryInfo]::New($childPath)
        foreach ($directory in $dirInfo.GetDirectories()) {
            $childNode = $xmlDoc.CreateElement('folder')
            $childNode.SetAttribute('name', $directory.Name) > $null
            $parentNode.AppendChild($childNode) > $null
            ProcessChildNode -parentNode:$childNode -childPath:"$childPath$($directory.Name)"
        }
        foreach ($file in $dirInfo.GetFiles()) {
            $childNode = $xmlDoc.CreateElement('file')
            $childNode.SetAttribute('name', $file.Name) > $null
            $childNode.SetAttribute('size', $file.Length) > $null
            $childNode.SetAttribute('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash) > $null
            $parentNode.AppendChild($childNode) > $null
        }
    }

    $xmlDoc = [XML]::New()

    $xmlDoc.AppendChild($xmlDoc.CreateProcessingInstruction('xml', 'version="1.0"')) > $null
    $rootNode = $xmlDoc.CreateElement('rootDirectory')
    $rootNode.SetAttribute('path', $path) > $null
    $xmlDoc.AppendChild($rootNode) > $null
    ProcessChildNode -parentNode:$rootNode -childPath:$path

    $xmlDoc.Save("$path\Tree.xml") > $null
    Write-Host "$path\Tree.xml"
}
Measure-Command {
    GenerateLibraryMap 'C:\assets\Revit20'
}

这很好用，但在我测试的文件结构上花费了 2 分钟以上，这可能只是我实际数据的 20%。因此，我正在考虑重构以使用 XML 流，据我所知，这可能会快得多。我发现这个 reference 让我开始，但它只包括根节点，没有提到如何生成层次结构。看起来好像您需要跟踪层次结构，以便可以在每个节点上适当地 .WriteEndElement。但这使我的简单递归崩溃了。我认为我需要在第 12 行的 ProcessChildNode 之后简单地 .WriteEndElement，但我不确定。

那么，我想我有两个问题...

1：在我进入这个兔子洞之前，这会导致代码速度明显加快吗？尤其是在处理数十个子文件夹中的数千个文件时？还有...

2：有人能给我指出一个资源，或者提供一个例子来说明如何处理递归问题吗？我有一种感觉，否则我的头会撞到墙上。

好的，真的是三问...

3：完成这项工作后，我计划重构为 class，既是为了提高性能，也是为了练习，因为我正在学习 OOP。当我开始下一个兔子洞时，有什么需要注意的地方吗？

编辑：通过 Mathias 的回应和一些挖掘，我得出了这个结论。

function GenerateLibraryMap {
    param ( 
        [String]$path
    )
    function ProcessChildNode { 
        param ( 
            [String]$childPath
        ) 
        $dirInfo = [System.IO.DirectoryInfo]::New($childPath)

        foreach ($directory in $dirInfo.GetDirectories()) {
            $xmlDoc.WriteStartElement('folder')
            $xmlDoc.WriteAttributeString('name', $directory.Name)
            ProcessChildNode -childPath:"$childPath$($directory.Name)"
        }
        foreach ($file in $dirInfo.GetFiles()) {
            $xmlDoc.WriteStartElement('file')
            $xmlDoc.WriteAttributeString('name', $file.Name)
            $xmlDoc.WriteAttributeString('size', $file.Length)
            $xmlDoc.WriteAttributeString('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash)

            $xmlDoc.WriteEndElement()
        }

        $xmlDoc.WriteEndElement()
    }

    $mapFilePath = "$(Split-Path $path -parent)\Tree_Stream.xml"
    $xmlSettings = [System.XMl.XmlWriterSettings]::New()
    $fileStream = [System.IO.FileStream]::New($mapFilePath, [System.IO.FileMode]::Append, [System.IO.FileAccess]::Write, [System.IO.FileShare]::Read)
    $streamWriter = [System.IO.StreamWriter]::New($fileStream)
    $xmlSettings.Indent = $true
    $xmlSettings.IndentChars = '  '
    $xmlSettings.ConformanceLevel = 'Auto'
    $xmlDoc = [System.XMl.XmlTextWriter]::Create($fileStream, $xmlSettings)

    $xmlDoc.WriteStartDocument()
    $xmlDoc.WriteStartElement('rootDirectory')

    $xmlDoc.WriteAttributeString('path', $path)

    ProcessChildNode -childPath:$path

    $xmlDoc.WriteEndElement
    $xmlDoc.WriteEndDocument
    $xmlDoc.Finalize
    $xmlDoc.Flush
    $xmlDoc.Close()
    Write-Host $mapFilePath
}
CLS
Measure-Command {
    GenerateLibraryMap 'C:\assets\Revit20'
}

我尝试使用 [System.IO.FileStream] 并直接使用文件路径实例化 $xmlDoc（仍然不能 100% 确定我理解其中的区别）。无论如何，这三种方法彼此相隔仅几秒钟，大约 2 分钟。所以在这种情况下似乎没有有意义的区别。如果有人看到提高性能的机会，我会洗耳恭听，但现在我将继续重构 classes.

编辑 #2：好吧，我实现了这样的基于 Class 的方法...

class GenerateLibraryMap {
    # Properties
    [XML.XMLDocument]$XML = [XML]::New()
    [String]$MapFilePath

    # Constructor
    GenerateLibraryMap ([String]$path) {
        $this.MapFilePath = "$(Split-Path $path -parent)\Tree_Class.xml"

        $this.XML.AppendChild($this.XML.CreateProcessingInstruction('xml', 'version="1.0"')) > $null
        $rootNode = $this.XML.CreateElement('rootDirectory')
        $rootNode.SetAttribute('path', $path) > $null
        $this.XML.AppendChild($rootNode) > $null

        $this.ProcessChildNode($rootNode, $path)

        $this.XML.Save($this.MapFilePath)
    }

    # Method
    [Void] ProcessChildNode([XML.XMLElement]$parentNode, [String]$childPath) {
        $dirInfo = [System.IO.DirectoryInfo]::New($childPath)
        foreach ($directory in $dirInfo.GetDirectories()) {
            $childNode = $this.XML.CreateElement('folder')
            $childNode.SetAttribute('name', $directory.Name)
            $parentNode.AppendChild($childNode)
            $this.ProcessChildNode($childNode, "$childPath$($directory.Name)")
        }
        foreach ($file in $dirInfo.GetFiles()) {
            $childNode = $this.XML.CreateElement('file')
            $childNode.SetAttribute('name', $file.Name)
            $childNode.SetAttribute('size', $file.Length)
            $childNode.SetAttribute('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash)
            $parentNode.AppendChild($childNode)
        }
    }

}


Measure-Command {
    $xml = [GenerateLibraryMap]::New('C:\assets\Revit20')
}

Write-Host "$($xml.MapFilePath)"

花费的时间完全相同。但是，无论如何都具有教育意义。看起来基于 Stream 的版本内存效率稍微高一些。希望有人发现有用的结果。

Answer 1

is this going to result in noticeably faster code?

也许吧。找出答案的最简单方法（无需实际分析您当前的方法）就是继续做，然后比较结果:)

how to deal with the recursion issue?

简单！

遵循这条规则：

打开一个节点的递归函数调用也应该负责关闭它——换句话说，你的递归函数的结构应该像（伪代码）:

function Recurse
{
    WriteStartElement
    if($shouldRecurse){
      Recurse
    }
    WriteEndElement
}

只要你坚持这个表格，你就没事。

I plan to refactor to a class, both for the performance boost and as an exercise since I am learning OOP. Are there any gotchas I need to look out for when I start down that next rabbit hole?

大概是吧？

同样，找出答案的最简单方法就是继续去做 - 如果您撞到墙上，Whosebug 仍然会在这里 :)

XMLWriter 和递归

XMLWriter & recursion

powershell

recursion

xmlwriter