如何遍历 POSIX shell 脚本中字符串的字符？

Question

A POSIX 兼容 shell 应提供这样的机制来遍历字符串集合：

for x in $(seq 1 5); do
    echo $x
done

但是，如何遍历单词的每个字符？

Answer 1

有点迂回，但我认为这适用于任何符合 posix 的 shell。我已经在 dash 中尝试过了，但是我手头没有 busybox 可以用来测试。

var='ab * cd'

tmp="$var"    # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
    rest="${tmp#?}"    # All but the first character of the string
    first="${tmp%"$rest"}"    # Remove $rest, and you're left with the first character
    echo "$first"
    tmp="$rest"
done

输出：

a
b

*

c
d

请注意，作业右侧的双引号是不需要的；我只是更喜欢在所有扩展名周围使用双引号，而不是试图跟踪可以安全地将它们关闭的位置。另一方面，[ -n "$tmp" ] 中的双引号是绝对必要的，如果字符串包含“*”，则需要 first="${tmp%"$rest"}" 中的内部双引号。

Answer 2

这适用于 dash 和 busybox:

echo 'ab * cd' | grep -o .

输出：

a
b

*

c
d

Answer 3

使用getopts一次处理一个字符的输入。 : 指示 getopts 忽略非法选项并设置 OPTARG。输入中的前导 - 使 getopts 将字符串视为选项。

如果getopts遇到冒号，则不会设置OPTARG，所以当OPTARG不是[=37=时，脚本使用参数扩展为return : ].

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "-"
    do echo "'${OPTARG:-:}'"
  done
}

while read -r line;do
  split_string "$line"
done

与接受的答案一样，这会按字节而不是按字符处理字符串，从而破坏多字节代码点。诀窍是检测多字节代码点，连接它们的字节然后打印它们：

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "";do
    case "${OPTARG:=:}" in
      ([[:print:]])
        [ -n "$multi" ] && echo "$multi" && multi=
        echo "$OPTARG" && continue
    esac
    multi="$multi$OPTARG"
    case "$multi" in
      ([[:print:]]) echo "$multi" && multi=
    esac
  done
  [ -n "$multi" ] && echo "$multi"
}
while read -r line;do
  split_string "-$line"
done

这里额外的case "$multi"用于检测多缓冲区何时包含可打印字符。这适用于 Bash 和 Zsh 等 shell，但 Dash 和 busybox ash 不匹配多字节代码点，忽略 locale.

这在一定程度上降低了性能：Dash/ash 将多字节代码点序列视为一个字符，但可以很好地处理被单字节字符包围的多字节字符。

根据您的要求，最好不要拆分连续的多字节代码点，因为下一个代码点可能是 combining character，它在之前修改字符。

这不会处理单字节字符后跟组合字符的情况。

如何遍历 POSIX shell 脚本中字符串的字符？

How to iterate over the characters of a string in a POSIX shell script?

shell

posix

sh

dash-shell