我无法使用 Powershell 从 A Word 文档中提取超链接

I cannot extract hyperlinks from A Word doc with Powershell

我正在尝试递归搜索目录结构以查找 word 文档,然后提取超链接。代码执行时输出如下:

processing 2 docs

File Name                Hyperlink
---------                ---------
C:\temp\doc1.docx
C:\temp\doc1.docx
C:\temp\folder\doc2.docx
C:\temp\folder\doc2.docx

我尝试过的任何方法似乎都不起作用。我试过使用:

Clear-Host

$parentFolder = "C:\temp"

$ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include *.doc*
"processing {0} docs" -f $ourDocs.Count


$word = New-Object -ComObject word.application

$word.Visible = $false
$word.ScreenUpdating = $false


$array = New-Object System.Collections.ArrayList

$ourDocs | ForEach-Object{

    $thisDoc = $word.Documents.Open($_.FullName)

    $thisDoc.Hyperlinks | ForEach-Object {

        $array.Add([pscustomobject]@{
        
            "File Name" = $thisDoc.FullName
            "Hyperlink" = $_Address}) | Out-null
        
    }
    $thisDoc.Close()
                
}

$Word.Quit()

$array

# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

错误在于你如何为你想要的 属性 值调用它。

试试这个...重构

Clear-Host

$parentFolder = "D:\temp\Word"

$ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include '*.doc*'
"processing {0} docs" -f $ourDocs.Count


$word                = New-Object -ComObject word.application
$word.Visible        = $false
$word.ScreenUpdating = $false

# This really is not needed for your posted use case.
# $array = New-Object System.Collections.ArrayList

$ourDocs | 
ForEach-Object{
    $thisDoc = $word.Documents.Open($PSItem.FullName)

    @($thisDoc.Hyperlinks) | 
    ForEach-Object {
        [pscustomobject]@{
            FileName  = $thisDoc.FullName
            HyperLink = $PSitem.Address
        }
    }
    $thisDoc.Close()
}

$Word.Quit()


# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

# Results
<#
processing 4 docs

FileName                     HyperLink                                        
--------                     ---------                                        
D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
#>

更新你的 Csv 评论和我对它的回复...

...

$ourDocs | 
ForEach-Object{
    $thisDoc = $word.Documents.Open($PSItem.FullName)

    @($thisDoc.Hyperlinks) | 
    ForEach-Object {
        [pscustomobject]@{
            FileName  = $thisDoc.FullName
            HyperLink = $PSitem.Address
        }
    } | 
    Export-Csv -Path 'D:\Temp\WordHyperLinkReport.csv' -Append -NoTypeInformation
    $thisDoc.Close()
}

...

Import-Csv -Path 'D:\Temp\WordHyperLinkReport.csv'
# Results
<#
FileName                     HyperLink                                        
--------                     ---------                                        
D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
#>