如何从较长的字符串中提取被双引号包围的多个子字符串

How to extract multiple substrings surrounded by double-quotes from a longer string

我正在尝试处理另一个看起来有点像这样的脚本的输出:

xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]

我想要做的是能够找到第一个用引号括起来的子串,确认值(即“ABCD”),然后取出所有剩余的子串(子串的数量是可变的)并放入他们在一个数组中。

我一直在四处寻找这个问题的答案,但我能找到的参考资料只涉及提取一个子字符串,而不是多个子字符串。

awk 测试第一对 " 字符之间的内容,并提取后续对之间的所有内容。

awk -v q="ABCD" -F'"' '==q{for (i=4; i<=NF; i+=2) print $i}'

要填充 bash 数组,您可以使用 mapfile 和进程替换:

mapfile -t arr < <( … )

测试:

mapfile -t arr < <(
  awk -v q="ABCD" -F'"' '==q{for (i=4; i<=NF; i+=2) print $i}' \
  <<< 'xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
)
printf '%s\n' "${arr[@]}"
EFGH
IJKL
MNOP
QRST
UVWX
YZ12

这个 Shellcheck-clean 演示程序展示了一种使用 Bash 自己的正则表达式匹配 ([[ str =~ regex ]]):

的方法
#! /bin/bash -p

input='xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'

# Regular expression to match strings with double quoted substrings.
# The first parenthesized subexpression matches the first string in quotes.
# The second parenthesized subexpression matches the entire portion of the
# string after the first quoted substring.
quotes_rx='^[^"]*"([^"]*)"(.*)$'

if [[ $input =~ $quotes_rx ]]; then
    if [[ ${BASH_REMATCH[1]} == ABCD ]]; then
        tmpstr=${BASH_REMATCH[2]}
    else
        echo "First quoted substring is not 'ABCD'" >&2
        exit 1
    fi
else
    echo 'Input does not contain any quoted substrings' >&2
    exit 1
fi

quoted_strings=()
while [[ $tmpstr =~ $quotes_rx ]]; do
    quoted_strings+=( "${BASH_REMATCH[1]}" )
    tmpstr=${BASH_REMATCH[2]}
done

declare -p quoted_strings
  • 有关 Bash 的正则表达式匹配的信息,请参阅 to