将 Word doc/docx 文件转换为 PDF 时文件系统观察器停止工作
File System Watcher stops working when converting Word doc/docx files to PDF
我有一个用于自动将 .doc/.docx 文件转换为 *.pdf 的 Powershell 脚本。
第一个文件的脚本运行很好。但是如果我把另一个文件放在监视文件夹中,监视程序不会触发事件。
这是完整的脚本。如果我注释掉所有 $doc 变量,脚本 运行 多次没有任何问题。我ignore/overlook有什么事吗?
$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = "$Env:DropboxRoot"
$watcher.Filter = "*.doc*"
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
Add-type -AssemblyName Microsoft.Office.Interop.Word
$action = {
$name = (get-item $Event.SourceEventArgs.FullPath).BaseName
### DON'T PROCESS WORD BACKUP FILES (START WITH A TILDE ~)
if(!($name.startsWith("~"))){
write-host Triggered event from $Event.SourceEventArgs.FullPath
$inputFilePath = $Event.SourceEventArgs.FullPath
$parentPath = (get-item $inputFilePath).Directory
$filename = (get-item $inputFilePath).BaseName
$pdfDir = "$parentPath\PDF"
if(!(Test-Path -Path $pdfDir)){
New-Item -ItemType directory -Path $pdfDir
}
###Execute PDF generate script
write-host Create word object
$word = New-Object -ComObject "Word.Application"
######define the parameters######
write-host Define parameters
$wdExportFormat =[Microsoft.Office.Interop.Word.WdExportFormat]::wdExportFormatPDF
$OpenAfterExport = $false
$wdExportOptimizeFor = [Microsoft.Office.Interop.Word.WdExportOptimizeFor]::wdExportOptimizeForOnScreen
$wdExportItem = [Microsoft.Office.Interop.Word.WdExportItem]::wdExportDocumentContent
$IncludeDocProps = $true
$KeepIRM = $false #Don't export Inormation Rights Management informations
$wdExportCreateBookmarks = [Microsoft.Office.Interop.Word.WdExportCreateBookmarks]::wdExportCreateWordBookmarks #Keep bookmarks
$DocStructureTags = $true #Add additional data for screenreaders
$BitmapMissingFonts = $true
$UseISO19005_1 = $true #Export as PDF/A
$outputFilePath = $pdfDir + "\" + $filename + ".pdf"
$doc = $word.Documents.Open($inputFilePath)
$doc.ExportAsFixedFormat($OutputFilePath,$wdExportFormat,$OpenAfterExport,`
$wdExportOptimizeFor,$wdExportRange,$wdStartPage,$wdEndPage,$wdExportItem,$IncludeDocProps,`
$KeepIRM,$wdExportCreateBookmarks,$DocStructureTags,$BitmapMissingFonts,$UseISO19005_1)
$doc.Close()
$word.Quit()
[void][System.Runtime.InteropServices.Marshal]::ReleaseComObject($doc)
[void][System.Runtime.InteropServices.Marshal]::ReleaseComObject($word)
[GC]::Collect()
[GC]::WaitForPendingFinalizers()
}
}
$created = Register-ObjectEvent $watcher -EventName "Created" -Action $action
$renamed = Register-ObjectEvent $watcher -EventName "Renamed" -Action $action
while($true) {
sleep 5
}`
你的脚本有一些问题,更多的调试逻辑可以找到。
在某些情况下,(Get-Item System.Management.Automation.PSEventArgs.SourceEventArgs.FullPath)
return 为空。由于未知原因,这似乎对每个转换的文档都发生一次。可能与“~Temp”文件有关。
随后,if(!($name.startsWith("~")
将抛出异常。
当你使用 $inputFilePath = $Event.SourceEventArgs.FullPath
时,你的变量是一个 FileInfo,你真的想传递一个字符串给 $word.Documents.Open($inputFilePath)
。
最后,有时 BaseName
为空。不知道为什么,但是代码可以对此进行测试或使用其他方法来剖析 FullPath 以获取名称和路径部分。
综上所述,一旦你开始工作,我的个人经验是在 PowerShell 中调用 Word 上的 COM 对象来执行此转换是不可靠的( Word 挂起,~Temp 文件落在后面,您必须从任务管理器中终止 Word,PowerShell 中的 COM 调用永远不会 return)。我的测试表明,调用 C# 控制台应用程序来进行转换要可靠得多。您可以完全用 C# 编写此目录监视程序和转换器并完成相同的任务。
假设您仍然想将两者结合起来,一个 PowerShell 观察器和一个 C# Word 到 PDF 转换器,下面是我想出的解决方案。脚本 运行s 大约一分钟,以便您可以在 ISE 或控制台中进行测试。从控制台按一个键退出。在退出之前,脚本通过注销事件 干净地 退出,这在 ISE 中测试时非常有用。 根据您打算如何 运行 脚本相应地更改它。
PowerShell 观察器
$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = "d:\test\docconvert\src"
$watcher.Filter = "*.doc*"
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
# copy this somehwere appropriate
# perhaps in same directory as your script
# put on a read-only share, etc.
$wordToPdf = 'd:\test\docconvert\WordToPdf\WordToPdf\bin\Debug\WordToPdf.exe'
$action = {
try
{
Write-Host "Enter action @ $(Get-Date)"
$fullPathObject = (Get-Item $Event.SourceEventArgs.FullPath)
if (!($fullPathObject))
{
Write-Host "(Get-Item $Event.SourceEventArgs.FullPath) returned null."
return
}
$fullPath = ($fullPathObject).ToString()
Write-Host "Triggered event from $fullPath"
$fileName = Split-Path $FullPath -Leaf
if ($fileName -and ($fileName.StartsWith("~")))
{
Write-Host "Skipping temp file"
return
}
# put pdf in same dir as the file
# can be changed, but a lot easier to test this way
$pdfDir = Split-Path $FullPath -Parent
$baseName = [System.IO.Path]::GetFileNameWithoutExtension($fileName)
$outputFilePath = Join-Path $pdfDir $($baseName + ".pdf")
Write-Host "outputFilePath is: '$outputFilePath'"
# call c# WordToPdf to do conversion because
# it is way more reliable than similar calls
# from PowerShell
& $wordToPdf $fullPath $outputFilePath
if ($LASTEXITCODE -ne 0)
{
Write-Host "Conversion result: FAIL"
}
else
{
Write-Host "Conversion result: OK"
}
}
catch
{
Write-Host "Exception from ACTION:`n$($_ | Select *)"
}
finally
{
Write-Host "Exit action @ $(Get-Date)"
}
}
$created = Register-ObjectEvent $watcher -EventName "Created" -Action $action
$renamed = Register-ObjectEvent $watcher -EventName "Renamed" -Action $action
$count = 12
while($count--) {
Write-Output "run/sleep ($count)..."
sleep 5
# will exit from console, not ISE
if ([console]::KeyAvailable) {
$key = [console]::ReadKey()
break
}
}
$created | % {Unregister-Event $_.Name}
$renamed | % {Unregister-Event $_.Name}
C# WordToPdf 转换器
为参数添加适当的错误检查...
添加对 COM 的引用 Microsoft.Office.Interop.Word
using System;
using Microsoft.Office.Interop.Word;
namespace WordToPdf
{
class Program
{
static int Main(string[] args)
{
Console.WriteLine($"Converting: {args[0]} to {args[1]}");
var conversion = new DocumentConversion();
bool result = conversion.WordToPdf(args[0], args[1]);
if (result)
{
return 0;
}
else {
return 1;
}
}
}
public class DocumentConversion
{
private Microsoft.Office.Interop.Word.Application Word;
private object Unknown = Type.Missing;
private object True = true;
private object False = false;
public bool WordToPdf(object Source, object Target)
{
bool ret = true;
if (Word == null) Word = new Microsoft.Office.Interop.Word.Application();
try
{
Word.Visible = false;
Word.Documents.Open(ref Source, ref Unknown,
ref True, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown);
Word.Application.Visible = false;
Word.WindowState = WdWindowState.wdWindowStateMinimize;
#if false
object saveFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;
Word.ActiveDocument.SaveAs(ref Target, ref saveFormat,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown);
#else
Word.ActiveDocument.ExportAsFixedFormat(
(string)Target, WdExportFormat.wdExportFormatPDF,
false, WdExportOptimizeFor.wdExportOptimizeForOnScreen,
WdExportRange.wdExportAllDocument, 0, 0,
WdExportItem.wdExportDocumentContent, true, false,
WdExportCreateBookmarks.wdExportCreateWordBookmarks,
true, true, true);
#endif
}
catch (Exception e)
{
Console.WriteLine(e.Message);
ret = false;
}
finally
{
if (Word != null)
{
// close the application
Word.Quit(ref Unknown, ref Unknown, ref Unknown);
}
}
return ret;
}
}
}
我有一个用于自动将 .doc/.docx 文件转换为 *.pdf 的 Powershell 脚本。 第一个文件的脚本运行很好。但是如果我把另一个文件放在监视文件夹中,监视程序不会触发事件。
这是完整的脚本。如果我注释掉所有 $doc 变量,脚本 运行 多次没有任何问题。我ignore/overlook有什么事吗?
$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = "$Env:DropboxRoot"
$watcher.Filter = "*.doc*"
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
Add-type -AssemblyName Microsoft.Office.Interop.Word
$action = {
$name = (get-item $Event.SourceEventArgs.FullPath).BaseName
### DON'T PROCESS WORD BACKUP FILES (START WITH A TILDE ~)
if(!($name.startsWith("~"))){
write-host Triggered event from $Event.SourceEventArgs.FullPath
$inputFilePath = $Event.SourceEventArgs.FullPath
$parentPath = (get-item $inputFilePath).Directory
$filename = (get-item $inputFilePath).BaseName
$pdfDir = "$parentPath\PDF"
if(!(Test-Path -Path $pdfDir)){
New-Item -ItemType directory -Path $pdfDir
}
###Execute PDF generate script
write-host Create word object
$word = New-Object -ComObject "Word.Application"
######define the parameters######
write-host Define parameters
$wdExportFormat =[Microsoft.Office.Interop.Word.WdExportFormat]::wdExportFormatPDF
$OpenAfterExport = $false
$wdExportOptimizeFor = [Microsoft.Office.Interop.Word.WdExportOptimizeFor]::wdExportOptimizeForOnScreen
$wdExportItem = [Microsoft.Office.Interop.Word.WdExportItem]::wdExportDocumentContent
$IncludeDocProps = $true
$KeepIRM = $false #Don't export Inormation Rights Management informations
$wdExportCreateBookmarks = [Microsoft.Office.Interop.Word.WdExportCreateBookmarks]::wdExportCreateWordBookmarks #Keep bookmarks
$DocStructureTags = $true #Add additional data for screenreaders
$BitmapMissingFonts = $true
$UseISO19005_1 = $true #Export as PDF/A
$outputFilePath = $pdfDir + "\" + $filename + ".pdf"
$doc = $word.Documents.Open($inputFilePath)
$doc.ExportAsFixedFormat($OutputFilePath,$wdExportFormat,$OpenAfterExport,`
$wdExportOptimizeFor,$wdExportRange,$wdStartPage,$wdEndPage,$wdExportItem,$IncludeDocProps,`
$KeepIRM,$wdExportCreateBookmarks,$DocStructureTags,$BitmapMissingFonts,$UseISO19005_1)
$doc.Close()
$word.Quit()
[void][System.Runtime.InteropServices.Marshal]::ReleaseComObject($doc)
[void][System.Runtime.InteropServices.Marshal]::ReleaseComObject($word)
[GC]::Collect()
[GC]::WaitForPendingFinalizers()
}
}
$created = Register-ObjectEvent $watcher -EventName "Created" -Action $action
$renamed = Register-ObjectEvent $watcher -EventName "Renamed" -Action $action
while($true) {
sleep 5
}`
你的脚本有一些问题,更多的调试逻辑可以找到。
在某些情况下,(Get-Item System.Management.Automation.PSEventArgs.SourceEventArgs.FullPath)
return 为空。由于未知原因,这似乎对每个转换的文档都发生一次。可能与“~Temp”文件有关。
随后,if(!($name.startsWith("~")
将抛出异常。
当你使用 $inputFilePath = $Event.SourceEventArgs.FullPath
时,你的变量是一个 FileInfo,你真的想传递一个字符串给 $word.Documents.Open($inputFilePath)
。
最后,有时 BaseName
为空。不知道为什么,但是代码可以对此进行测试或使用其他方法来剖析 FullPath 以获取名称和路径部分。
综上所述,一旦你开始工作,我的个人经验是在 PowerShell 中调用 Word 上的 COM 对象来执行此转换是不可靠的( Word 挂起,~Temp 文件落在后面,您必须从任务管理器中终止 Word,PowerShell 中的 COM 调用永远不会 return)。我的测试表明,调用 C# 控制台应用程序来进行转换要可靠得多。您可以完全用 C# 编写此目录监视程序和转换器并完成相同的任务。
假设您仍然想将两者结合起来,一个 PowerShell 观察器和一个 C# Word 到 PDF 转换器,下面是我想出的解决方案。脚本 运行s 大约一分钟,以便您可以在 ISE 或控制台中进行测试。从控制台按一个键退出。在退出之前,脚本通过注销事件 干净地 退出,这在 ISE 中测试时非常有用。 根据您打算如何 运行 脚本相应地更改它。
PowerShell 观察器
$watcher = New-Object System.IO.FileSystemWatcher
$watcher.Path = "d:\test\docconvert\src"
$watcher.Filter = "*.doc*"
$watcher.IncludeSubdirectories = $true
$watcher.EnableRaisingEvents = $true
# copy this somehwere appropriate
# perhaps in same directory as your script
# put on a read-only share, etc.
$wordToPdf = 'd:\test\docconvert\WordToPdf\WordToPdf\bin\Debug\WordToPdf.exe'
$action = {
try
{
Write-Host "Enter action @ $(Get-Date)"
$fullPathObject = (Get-Item $Event.SourceEventArgs.FullPath)
if (!($fullPathObject))
{
Write-Host "(Get-Item $Event.SourceEventArgs.FullPath) returned null."
return
}
$fullPath = ($fullPathObject).ToString()
Write-Host "Triggered event from $fullPath"
$fileName = Split-Path $FullPath -Leaf
if ($fileName -and ($fileName.StartsWith("~")))
{
Write-Host "Skipping temp file"
return
}
# put pdf in same dir as the file
# can be changed, but a lot easier to test this way
$pdfDir = Split-Path $FullPath -Parent
$baseName = [System.IO.Path]::GetFileNameWithoutExtension($fileName)
$outputFilePath = Join-Path $pdfDir $($baseName + ".pdf")
Write-Host "outputFilePath is: '$outputFilePath'"
# call c# WordToPdf to do conversion because
# it is way more reliable than similar calls
# from PowerShell
& $wordToPdf $fullPath $outputFilePath
if ($LASTEXITCODE -ne 0)
{
Write-Host "Conversion result: FAIL"
}
else
{
Write-Host "Conversion result: OK"
}
}
catch
{
Write-Host "Exception from ACTION:`n$($_ | Select *)"
}
finally
{
Write-Host "Exit action @ $(Get-Date)"
}
}
$created = Register-ObjectEvent $watcher -EventName "Created" -Action $action
$renamed = Register-ObjectEvent $watcher -EventName "Renamed" -Action $action
$count = 12
while($count--) {
Write-Output "run/sleep ($count)..."
sleep 5
# will exit from console, not ISE
if ([console]::KeyAvailable) {
$key = [console]::ReadKey()
break
}
}
$created | % {Unregister-Event $_.Name}
$renamed | % {Unregister-Event $_.Name}
C# WordToPdf 转换器
为参数添加适当的错误检查...
添加对 COM 的引用 Microsoft.Office.Interop.Word
using System;
using Microsoft.Office.Interop.Word;
namespace WordToPdf
{
class Program
{
static int Main(string[] args)
{
Console.WriteLine($"Converting: {args[0]} to {args[1]}");
var conversion = new DocumentConversion();
bool result = conversion.WordToPdf(args[0], args[1]);
if (result)
{
return 0;
}
else {
return 1;
}
}
}
public class DocumentConversion
{
private Microsoft.Office.Interop.Word.Application Word;
private object Unknown = Type.Missing;
private object True = true;
private object False = false;
public bool WordToPdf(object Source, object Target)
{
bool ret = true;
if (Word == null) Word = new Microsoft.Office.Interop.Word.Application();
try
{
Word.Visible = false;
Word.Documents.Open(ref Source, ref Unknown,
ref True, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown);
Word.Application.Visible = false;
Word.WindowState = WdWindowState.wdWindowStateMinimize;
#if false
object saveFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;
Word.ActiveDocument.SaveAs(ref Target, ref saveFormat,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown, ref Unknown,
ref Unknown, ref Unknown);
#else
Word.ActiveDocument.ExportAsFixedFormat(
(string)Target, WdExportFormat.wdExportFormatPDF,
false, WdExportOptimizeFor.wdExportOptimizeForOnScreen,
WdExportRange.wdExportAllDocument, 0, 0,
WdExportItem.wdExportDocumentContent, true, false,
WdExportCreateBookmarks.wdExportCreateWordBookmarks,
true, true, true);
#endif
}
catch (Exception e)
{
Console.WriteLine(e.Message);
ret = false;
}
finally
{
if (Word != null)
{
// close the application
Word.Quit(ref Unknown, ref Unknown, ref Unknown);
}
}
return ret;
}
}
}