使用 powershell 删除不必要的字符串和特殊字符

Remove unnecessary strings and special characters using powershell

我有这个字符串:

$html = @'
<div><span style="display:inline !important;">As an Inventory and OperationsAgent</span></div><div><span style="display:inline !important;">I want to be able to raise internal EPL Nominations for Buildings that are going through a Shared Installation journey</span></div><div><span style="display:inline !important;">So that internal EPLNominations can be submitted when the TDF cost is too expensive for both P2P and Shared</span></div>
'@

...如何删除所有 HTML 标签?

您可以使用 -replace 正则表达式运算符删除所有 html 标签:

$html -replace '<[^>]+>'

, 也替换为 <div> 边界:

$html -replace '</div>\s*<div>',', ' -replace '<[^>]+>'

这将输出如下字符串:

As an Inventory and OperationsAgent, I want to be able to raise internal EPL Nominations for Buildings that are going through a Shared Installation journey, So that internal EPLNominations can be submitted when the TDF cost is too expensive for both P2P and Shared

您可以先解码您的 html,然后替换 html 标签:

$DecodedHtml = [System.Web.HttpUtility]::htmldecode($html)
$DecodedHtml -replace '<[^>]*>'