如何用powershell扩展文件内容

Question

我想这样做：

$content = get-content "test.html"
$template = get-content "template.html"
$template | out-file "out.html"

其中 template.html 包含

<html>
  <head>
  </head>
  <body>
    $content
  </body>
</html>

和test.html包含：

<h1>Test Expand</h1>
<div>Hello</div>

我在 out.html 的前 2 个字符中得到奇怪的字符：

    ��

内容未展开

如何解决这个问题？

Answer 1

对于"weird characters"，它们可能是 BOM（字节顺序标记）。使用Out-File时，使用-Encoding参数显式指定输出编码，例如：

$Template |Out-File out.html -Encoding UTF8

对于字符串扩展，您需要明确告诉 powershell 这样做：

$Template = $Template |ForEach-Object {
    $ExecutionContext.InvokeCommand.ExpandString($_)
}
$Template | Out-File out.html -Encoding UTF8

Answer 2

用以下解决方案补充：

效率更高。
确保输入文件被读取为 UTF-8，即使它们没有（伪）BOM (byte-order mark).
通过编写一个没有伪 BOM.

来完全避免 "weird character" 问题

# Explicitly read the input files as UTF-8, as a whole. $content = get-content -raw -encoding utf8 test.html $template = get-content -raw -encoding utf8 template.html # Write to output file using UTF-8 encoding *without a BOM*. [IO.File]::WriteAllText( "$PWD/out.html", $ExecutionContext.InvokeCommand.ExpandString($template) )

get-content -raw (PSv3+) 将 中的文件作为一个整体 读取为 单个字符串 （而不是逐行的 array 字符串），虽然占用更多内存，但速度更快。对于 HTML 个文件，内存使用量不应该是一个问题。

完整读取文件的另一个优点是，如果模板包含多行子表达式（$(...)），扩展仍然可以正常运行.

get-content -encoding utf8 确保输入文件被解释为使用字符编码 UTF-8，这是当今网络世界的典型做法。

这是至关重要的，因为 UTF-8 编码的 HTML 文件通常 而不是 具有 PowerShell 无法识别的 3 字节伪 BOM需要正确识别文件为 UTF-8 编码（见下文）。

单个 $ExecutionContext.InvokeCommand.ExpandString() 调用就足以执行模板扩展。

Out-File -Encoding utf8 总是会创建一个带有伪 BOM 的文件，这是不受欢迎的。
相反，使用 [IO.File]::WriteAllText()，利用 .NET Framework 默认情况下 创建 UTF-8 编码文件 的事实物料清单.

注意在 out.html 之前使用 $PWD/，这是确保文件写入 PowerShell 的当前位置（目录）所必需的;不幸的是，.NET Framework 认为当前目录不与 PowerShell 同步。

最后，强制性安全警告：仅在您信任的输入上使用此扩展技术，因为可能会执行任意嵌入式命令。

可选的背景信息

PowerShell 的 Out-File、> 和 >> 默认使用 UTF-16 LE character encoding with a BOM (byte-order mark)（如前所述，"weird characters"）。

虽然 Out-File -Encoding utf8 允许创建 UTF-8 输出文件，
PowerShell 总是在输出文件前加上一个 3 字节 pseudo-BOM，一些实用程序，尤其是那些具有 Unix 传统的实用程序，在使用时会遇到问题 - 所以 你会仍然得到"weird characters"（虽然不同）。

如果您想要一种更类似于 PowerShell 的方式来创建无 BOM 的 UTF-8 文件，请参阅我的 this answer，它定义了一个 Out-FileUtf8NoBom 函数，该函数模拟了 Out-File.
的核心功能
相反，在读取文件时，您必须使用Get-Content -Encoding utf8 来确保无 BOM 的 UTF-8 文件被识别。
在没有 UTF-8 伪 BOM 的情况下，Get-Content 假定文件使用系统的 旧版代码页指定的单字节扩展 ASCII 编码（例如，英语系统上的 Windows-1252，PowerShell 调用的编码 Default）。

请注意，虽然 Windows-only 编辑器（如记事本）创建 UTF-8 文件伪 BOM（if 你明确选择保存为UTF-8；默认是遗留代码页编码，"ANSI"), 越来越流行的跨平台编辑器如Visual Studio Code, Atom, and Sublime Text默认做不在创建文件时使用伪BOM。

如何用powershell扩展文件内容

How to expand file content with powershell

powershell

variable-expansion

file-encodings

可选的背景信息