比较 Bash 中字母顺序的字符串,测试与双括号语法

Comparing strings for alphabetical order in Bash, test vs. double bracket syntax

我正在处理一个 Bash 脚本项目,如果两个文件具有相同的内容,我需要删除其中一个文件。 I should delete the one which comes last in an alphabetical sort and in the example output my professor has provided, apple.dat is deleted when the choices are apple.dat and Apple.dat.

if [[ "apple" > "Apple" ]]; then
    echo apple
else
    echo Apple
fi

打印 Apple

echo $(echo -e "Apple\napple" | sort | tail -n1)

打印 Apple

a的ASCII值为97,A为65,为什么测试说A更大?

奇怪的是我用旧语法得到了相反的结果:

if [ "apple" \> "Apple" ]; then
    echo apple
else
    echo Apple
fi

打印苹果

如果我们尝试在 [[ ]] 语法中使用 \> ,这是一个语法错误。

我们如何针对双括号语法更正此问题?我已经在学校的 Debian 服务器、我的本地机器和我的 Digital Ocean Droplet 服务器上测试过了。在我的本地 Ubuntu 20.04 和学校服务器上,我得到了上述输出。有趣的是,在我的 Ubuntu 20.04 服务器 Digital Ocean droplet 上,我得到了带有双括号和单括号语法的“apple”。我们可以使用语法、双括号或单括号实际测试调用,但我更喜欢使用较新的双括号语法,并且宁愿学习如何使它工作,也不愿将我大部分完成的脚本转换为较旧的 more POSIX 符合语法。

提示:

$ (LC_COLLATE=C; if [ "apple" \> "Apple" ]; then echo apple; else echo Apple; fi)
apple
$ (LC_COLLATE=en_US; if [ "apple" \> "Apple" ]; then echo apple; else echo Apple; fi)
apple

但是:

$ (LC_COLLATE=C; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
apple
$ (LC_COLLATE=en_US; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
Apple

不同之处在于 Bash 特定测试 [[ ]] 使用语言环境排序规则来比较字符串。而 POSIX 测试 [ ] 使用 ASCII 值。

来自 bash 手册页:

When used with [[, the < and > operators sort lexicographically using the current locale.

When used with test or [, the < and > operators sort lexicographically using ASCII ordering.

更改语法。 if [[ "Apple" -gt "apple" ]] 按预期工作。

我已经想出了自己的解决方案,但是我必须首先感谢@GordonDavisson 和@LéaGris 的帮助以及我从他们那里学到的东西,因为这对我来说非常宝贵。

不管是电脑语言环境还是人为语言环境,按字母顺序,如果apple排在Apple之后,那么它也排在Banana之后,如果Banana排在apple之后,那么Apple排在apple之后。所以我想出了以下内容:

# A function which sorts two words alphabetically with lower case coming after upper case.
# The last word in the sort will be printed twice to demonstrate that this works for both
# the POSIX compliant single bracket test call and the newer double bracket condition
# syntax.
# arg 1: One of two words to sort
# arg 2: One of two words to sort
# Return: 0 upon completion, 1 if incorrect number of args is given
sort_alphabetically() {
    [ $# -ne 2 ] && return 1

    word_1_val=0
    word_2_val=0

    while read -n1 letter; do
        (( word_1_val += $(printf '%d' "'$letter") ))
    done < <(echo -n "")

    while read -n1 letter; do
        (( word_2_val += $(printf '%d' "'$letter") ))
    done < <(echo -n "")

    if [ $word_1_val -gt $word_2_val ]; then
        echo 
    else
        echo 
    fi

    if [[ $word_1_val -gt $word_2_val ]]; then
        echo 
    else
        echo 
    fi

    return 0
}

sort_alphabetically "apple" "Apple"
sort_alphabetically "Banana" "apple"
sort_alphabetically "aPPle" "applE"

打印:

apple
apple
Banana
Banana
applE
applE

这可以使用进程替换并将输出重定向到 while 循环以一次读取一个字符,然后使用 printf 获取每个字符的十进制 ASCII 值。这就像从将自动销毁的字符串创建一个临时文件,然后一次读取一个字符。 echo 的 -n 表示 \n 字符,如果有来自用户输入或其他内容的字符,将被忽略。

来自 bash 手册页:

Process Substitution

Process substitution allows a process's input or output to be referred to using a filename. It takes the form of <(list) or >(list). The process list is run asynchronously, and its input or output appears as a filename. This filename is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will provide input for list. If the <(list) form is used, the file passed as an argument should be read to obtain the output of list. Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files.

When available, process substitution is performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion.

来自 Whosebug post about printf:

If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.

注意:进程替换不 POSIX 兼容,但 Bash 以 bash 手册页中所述的方式支持它。


更新: 以上并非在所有情况下都有效!


上述解决方案在很多情况下都有效,但我们遇到了一些异常情况。

first word second word last alphabetically
apple Apple apple correct
Apple apple apple correct
apPLE Apple Apple incorrect
apple Banana Banana correct
apple BANANA apple incorrect

以下解决方案可获得所需的结果:

#!/bin/bash

sort_alphabetically() {
    [ $# -ne 2 ] && return 1

    local WORD_1=""
    local WORD_2=""
    local WORD_1_LOWERED="$(echo -n  | tr '[:upper:]' '[:lower:]')"
    local WORD_2_LOWERED="$(echo -n  | tr '[:upper:]' '[:lower:]')"

    if [ $(echo -e "$WORD_1\n$WORD_2" | sort | tail -n1) = "$WORD_1" ] ||\
       [ $(echo -e "$WORD_1_LOWERED\n$WORD_2_LOWERED" | sort | tail -n1) =\
         "$WORD_1_LOWERED" ]; then

        if [ "$WORD_1_LOWERED" = "$WORD_2_LOWERED" ]; then

            ASCII_VAL_WORD_1=0
            ASCII_VAL_WORD_2=0
            read -n1 FIRST_CHAR_1 < <(echo -n "$WORD_1")
            read -n1 FIRST_CHAR_2 < <(echo -n "$WORD_2")

            while read -n1 character; do
                (( ASCII_VAL_WORD_1 += $(printf '%d' "'$character") ))
            done < <(echo -n $WORD_1)
            
            while read -n1 character; do
                (( ASCII_VAL_WORD_2 += $(printf '%d' "'$character") ))
            done < <(echo -n $WORD_2)
            
            if [ $ASCII_VAL_WORD_1 -gt $ASCII_VAL_WORD_2 ] &&\
               [ "$FIRST_CHAR_1" \> "$FIRST_CHAR_2" ]; then

                echo "$WORD_1"
            elif [ $ASCII_VAL_WORD_2 -gt $ASCII_VAL_WORD_1 ] &&\
                 [ "$FIRST_CHAR_2" \> "$FIRST_CHAR_1" ]; then

                echo "$WORD_2"
            elif [ "$FIRST_CHAR_1" \> "$FIRST_CHAR_2" ]; then
                echo "$WORD_1"
            else
                echo "$WORD_2"
            fi
        else
            echo "$WORD_1"
        fi
    else
        echo $WORD_2
    fi

    return 0
}

sort_alphabetically "apple" "Apple"
sort_alphabetically "Apple" "apple"
sort_alphabetically "apPLE" "Apple"
sort_alphabetically "Apple" "apPLE"
sort_alphabetically "apple" "Banana"
sort_alphabetically "apple" "BANANA"

exit 0

打印:

apple
apple
apPLE
apPLE
Banana
BANANA