从字符串中提取多个值

Extract multiple values from string

我们使用这种方法来查找单个关键字

Get-Content $SourceFile | Select-String -Pattern "search keyword value"

但是,我们必须提取 4 个值,即嵌入的英镑 (£) 值(可变货币金额)和文字子字符串,如下所示:

# Sample input
$String =' in the case of a single acquisition the Total Purchase Price of which (less the amount
funded by Acceptable Funding Sources (Excluding Debt)) exceeds £5,000,000 (or its
equivalent) but is less than or equal to £10,000,000 or its equivalent, the Parent shall
supply to the Agent for the Lenders not later than the date a member of the Group
legally commits to make the relevant acquisition, a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'

# Values to extract

$Value1 = ' in the case of a single acquisition the Total Purchase Price '

$Value2 = ' £5,000,000'

$Value3 = ' £10,000,000'

$Value4 = ' a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'
# Define the regex patterns to search for indidvidually, as elements of an array.
$patterns = 
    # A string literal; escape it, to be safe.
    [regex]::Escape(' in the case of a single acquisition the Total Purchase Price '),     
    # A regex that matches a currency amount in pounds.
    # (Literal ' £', followed by at least one ('+') non-whitespace char. ('\S')
    # - this could be made more stringent by matching digits and commas only.)
    ' £\S+',     
    # A string literal that *needs* escaping due to use of '(' and ')'
    # Note the use of a literal here-string (@'<newline>...<newline>'@)
    [regex]::Escape(@'
a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;
'@)

# - Use Get-Content -Raw to read the file *as a whole*
# - Use Select-String -AllMatches to find *multiple* matches (per input string)
# - ($patterns -join '|') joins the individual regexes with an alternation (|)
#   so that matches of any one of them are returned.
Get-Content -Raw $SourceFile | Select-String -AllMatches -Pattern ($patterns -join '|') |
  ForEach-Object {
    # Loop over the matches, each of which contains the captured substring
    # in index [0], and collect them in an *array*, $capturedSubstrings
    # Note: You could use `Set-Variable` to create individual variables $Variable1, ...
    #       but it's usually easier to work with an array.
    $capturedSubstrings = foreach ($match in $_.Matches) { $match[0].Value }
    # Output the array elements in diagnostic form.
    $capturedSubstrings | % { "[$_]" }
  }

请注意 -Pattern 通常接受 数组 值,因此使用 -Pattern $patterns 应该 工作(尽管行为略有不同),但由于 bug.

,PowerShell Core 6.1.0 没有

警告:假设您的脚本使用与 $SourceFile 相同的换行样式(CRLF vs. LF-only);如果两者不同,则需要做更多的工作,这将表现为最后一个模式(多行模式)不匹配。

对于包含上面 $String 内容的文件,这会产生:

[ in the case of a single acquisition the Total Purchase Price ]
[ £5,000,000]
[ £10,000,000]
[a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;]