检查 Bash 中的数组中是否包含任何子字符串

Question

假设我有一个字符串，

a="This is a string"

和一个数组，

b=("This is my" "sstring")

如果 a 的任何子字符串位于 b 中，我想执行 if 条件，这是真的，因为“This is”是 [= 的第一个元素的子字符串16=].

如果有两个字符串，我知道如何检查 $x 是否是 $y 的子字符串，使用

if [[ $y == *$x* ]]; then
 #Something
fi

但由于 $x 是一个字符串数组，我不知道如何在不必显式循环遍历数组的情况下执行此操作。

Answer 1

您可以将 $a 拆分为一个数组，然后循环两个数组以查找匹配项：

a="this is a string"
b=( "this is my" "string")

# Make an array by splitting $a on spaces
IFS=' ' read -ra aarr <<< "$a"

for i in "${aarr[@]}"
do 
  for j in "${b[@]}"
  do
    if [[ $j == *"$i"* ]]; then
      echo "Match: $i : $j"
      break
    fi
  done
done

# Match: this : this is my
# Match: is : this is my
# Match: string : string

如果您需要处理 $a 中的子字符串（例如 this is、is my 等），那么您将需要遍历数组，生成所有可能的子字符串：

for (( length=1; length <= "${#aarr[@]}"; ++length )); do
  for (( start=0; start + length <= "${#aarr[@]}"; ++start )); do
    substr="${aarr[@]:start:length}"
    for j in "${b[@]}"; do
      if [[ $j == *"${substr}"* ]]; then
        echo "Match: $substr : $j"
        break
      fi
    done
  done
done

# Match: this : this is my
# Match: is : this is my
# Match: string : string
# Match: this is : this is my

Answer 2

下面是如何将字符串 a 的最大单词数匹配到数组 b 的条目：

#!/usr/bin/env bash

a="this is a string"
b=("this is my" "string" )

# tokenize a words into an array
read -ra a_words <<<"$a"

match()
{
  # iterate entries of array b
  for e in "${b[@]}"; do

    # tokenize entry words into an array
    read -ra e_words <<<"$e"

    # initialize counter/length to the shortest MIN words count
    i=$(( ${#a_words[@]} < ${#e_words[@]} ? ${#a_words[@]} : ${#e_words[@]} ))

    # iterate matching decreasing number of words
    while [ 0 -lt "$i" ]; do

      # return true it matches
      [ "${e_words[*]::$i}" = "${a_words[*]::$i}" ] && return

      # decrease number of words to match
      i=$(( i - 1 ))
    done
  done

  # reaching here means no match found, return false
  return 1
}

if match; then
  printf %s\n 'It matches!'
fi

Answer 3

这可能就是您所需要的：

$ printf '%s\n' "${b[@]}" | grep -wFf <(tr ' ' $'\n' <<<"$a")
This is my

否则 - shell 是一种工具，用于操纵 files/processes 和对工具的调用排序。发明 shell 的人还为 shell 发明了 awk 来调用操作文本。您要做的是操作文本，因此您很有可能应该使用 awk 而不是 shell，因为无论您正在做什么，这项任务都是其中的一部分。

$ printf '%s\n' "${b[@]}" | 
awk -v a="$a" '
    BEGIN { split(a,words) }
    { for (i in words) if (index([=11=],words[i])) { print; f=1; exit} }
    END { exit !f }
'
This is my

上面假设 a 不包含任何反斜杠，如果可以的话，可以使用它来代替：

printf '%s\n' "${b[@]}" | a="$a" awk 'BEGIN{split(ENVIRON["a"],words)} ...'

如果 b 中的任何元素可以包含换行符，则：

printf '%s[=13=]' "${b[@]}" | a="$a" awk -v RS='[=13=]' 'BEGIN{split(ENVIRON["a"],words)} ...'

检查 Bash 中的数组中是否包含任何子字符串

Check if any substring is contained in an array in Bash

bash

grep