如何从 Bash 中的字符串右侧按索引 trim 一个字段？

Question

我想从以下字符串中删除“（字段 5）”：

test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"

问题：

有时 (field 4) 甚至不存在。
无论如何，我只想将 (field 6) 保留在字符串的末尾。
有时我根本没有任何字段 "field 3"，在这种情况下我只保留字符串原样，即。 [field 1 (field 2)] field 3

到目前为止唯一的方法很脏：

$ first_fields="$(printf "${test_string[@]}" | cut -d'(' -f -2)"

$ echo $first_field
> [field 1 (field 2)] field 3

$ last_field="$(printf "(${test_string##*\(}")"

$ echo "$last_field"
> (field 6)

问题在这里：

如果我有可变数量的字段，我不能 cut -f 硬编码字段数值，否则我会丢失 (field 4)
我只需要将最后一个 (field) 保留在字符串的右端，不管它是什么。

问题：如何从字符串的右端开始计算字段数？ 或者我是否超出了 Unix shell 功能的限制？

我尝试了以下方法，但我总是只得到一个字段，即整个字符串本身：

IFS="("
for i in "${test_string[@]}";
do
    echo "field is: $i"
done
> [field 1 (field 2)] field 3 (field 4) (field 5) (field 6)

注意：字段总是在括号之间并且每次都包含完全随机的字符（更糟糕的是，它们是用 unicode 编码的外语）。

Answer 1

您可以使用锚定到末尾的正则表达式。

#!/bin/bash
test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"
rgx_field="[(].*[)]"
rgx_space="[[:space:]]*"
if
  [[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]
then
  result="${BASH_REMATCH[1]}${BASH_REMATCH[2]}" # Removed
else
  result=$test_string # No match... Buggy data?
fi
echo "$result"

这假设字段括在括号中，就像您的示例代码一样。

关键行是这样的：

[[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]

=~ 运算符尝试将左侧的字符串与右侧的扩展正则表达式相匹配。括号内的行部分是正则表达式匹配引擎对 "remember" 这些部分的说明（然后在 BASH_REMATCH 数组中可用）。尾随 $ 表示此正则表达式必须匹配字符串的末尾，以便它从最后一个字段开始工作 "backwards"。前导字段均由初始 (.*).

匹配

Answer 2

您可以使用 sed:

$> test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"
$> sed -E 's/^(.*)\([^)]*\) (\([^)]*\))$//' <<< "$test_string"
[field 1 (field 2)] field 3 (field 4) (field 6)

$> test_string="[field 1 (field 2)] field 3 (field 5) (field 6)"
$> sed -E 's/^(.*)\([^)]*\) (\([^)]*\))$//' <<< "$test_string"
[field 1 (field 2)] field 3 (field 6)

此 sed 命令使用正则表达式从输入中删除第 (last -1) 个 (...) 值。

如何从 Bash 中的字符串右侧按索引 trim 一个字段？

How to trim a field by index from the right of a string in Bash?

string

bash

shell

field

cut