Linux gcc wordexp 不使用 IFS 环境变量 Posix 用于拆分单词的 C 库函数

Question

环境

OS: Ubunty 20.4, Centos 8, macOS Catalina 10.15.7
语言：C、C++
编译器：gcc（每个 OS 的最新版本）

问题

我正在使用 wordexp Posix 库函数来进行 shell 式的字符串扩展。
扩展工作正常，但有一个例外：当我将 $IFS 环境变量设置为 whitespace 以外的其他值时，例如“:”，它似乎不会影响继续在 white[ 上完成的单词拆分=145=] 仅与 IFS 值无关。

bash 测试

wordexp Linux https://man7.org/linux/man-pages/man3/wordexp.3.html 的手册页指出：

"函数 wordexp() 对字符串执行类似 shell 的扩展..."
"使用环境变量 $IFS 完成字段拆分。如果未设置，字段分隔符为 space、制表符和换行符。"

这就是为什么我希望 wordexp 在这方面与 bash 表现相同。
在所有列出的 OSes 上，当更改用于拆分的字符集时，我得到了完全正确和预期的结果：
使用默认值（未设置 IFS）

    read -a words <<<"1 2:3 4:5"
    for word in "${words[@]}"; do echo "$word";  done

在 space 上正确拆分并生成结果：

    1
    2:3
    4:5

同时将 IFS 设置为“:”

    IFS=':' read -a words <<<"1 2:3 4:5"
    for word in "${words[@]}"; do echo "$word";  done

在“:”上正确拆分并生成结果：

    1 2
    3 4
    5

C代码测试

但是运行无论是否设置了 IFS 环境变量，下面的代码都会产生相同的结果：

C代码：

    #include <stdio.h>
    #include <wordexp.h>
    #include <stdlib.h>
    
    static void expand(char const *title, char const *str)
    {
        printf("%s input: %s\n", title, str);
        wordexp_t exp;
        int rcode = 0;
        if ((rcode = wordexp(str, &exp, WRDE_NOCMD)) == 0) {
            printf("output:\n");
            for (size_t i = 0; i < exp.we_wordc; i++)
                printf("%s\n", exp.we_wordv[i]);
            wordfree(&exp);
        } else {
            printf("expand failed %d\n", rcode);
        }
    }
    
    int main()
    {
        char const *str = "1 2:3 4:5";
        
        expand("No IFS", str);
    
        int rcode = setenv("IFS", ":", 1);
        if ( rcode != 0 ) {
            perror("setenv IFS failed: ");
            return 1;
        }
    
        expand("IFS=':'", str);
    
        return 0;
    }

所有 OS 中的结果是相同的：

    No IFS input: 1 2:3 4:5
    output:
    1
    2:3
    4:5
    IFS=':' input: 1 2:3 4:5
    output:
    1
    2:3
    4:5

请注意，上面的代码片段是为此创建的 post - 我用更复杂的代码进行了测试，验证了环境变量确实设置正确。

源代码审查

我查看了 wordexp 函数实现的源代码，可在 https://code.woboq.org/userspace/glibc/posix/wordexp.c.html 找到，它似乎确实使用了 $IFS 但可能不一致，或者这可能是一个错误。
具体来说：
在 wordexp 的主体中，从 行 2229 开始，它确实获取 IFS 环境变量值并对其进行处理：
第 2273 - 2276 行：

     /* Find out what the field separators are.
       * There are two types: whitespace and non-whitespace.
       */
      ifs = getenv ("IFS");

但是后来在函数中似乎没有使用 $IFS 值进行单词分隔。
这看起来像一个错误，除非 行 2273 上的“字段分隔符” 第 2396 行 上的“单词分隔符”意思不同。
第 2395 - 2398 行：

          default:
            /* Is it a word separator? */
            if (strchr (" \t", words[words_offset]) == NULL)
            {

但无论如何，代码似乎只使用 space 或制表符作为分隔符与 bash 不同，它尊重 IFS 设置拆分器值。

问题

我是不是遗漏了什么，有办法让 wordexp 拆分除 whitespace 以外的字符吗？
如果仅在白色上拆分space，这是不是
- gcc 库实现或
- 在 wordexp 的 Linux 手册页中，他们声称 $IFS 可用于定义拆分器

非常感谢您的所有评论和见解！

答案摘要和解决方法

在已接受的答案中有一个关于如何从 $IFS 实现非白色space 字符拆分的提示：您必须设置 $IFS 并将要拆分的字符串作为一个临时环境变量的值，然后对该临时变量调用 wordexp。这在下面的更新代码中得到了演示。
虽然在源代码中可见的这种行为实际上可能不是错误，但对我来说它绝对是一个有问题的设计决定……
更新代码：

    #include <stdio.h>
    #include <wordexp.h>
    #include <stdlib.h>
    
    static void expand(char const *title, char const *str)
    {
        printf("%s input: %s\n", title, str);
        wordexp_t exp;
        int rcode = 0;
        if ((rcode = wordexp(str, &exp, WRDE_NOCMD)) == 0) {
            printf("output:\n");
            for (size_t i = 0; i < exp.we_wordc; i++)
                printf("%s\n", exp.we_wordv[i]);
            wordfree(&exp);
        } else {
            printf("expand failed %d\n", rcode);
        }
    }
    
    int main()
    {
        char const *str = "1 2:3 4:5";
        
        expand("No IFS", str);
    
        int rcode = setenv("IFS", ":", 1);
        if ( rcode != 0 ) {
            perror("setenv IFS failed: ");
            return 1;
        }
    
        expand("IFS=':'", str);
        
        rcode = setenv("FAKE", str, 1);
        if ( rcode != 0 ) {
            perror("setenv FAKE failed: ");
            return 2;
        }
    
        expand("FAKE", "${FAKE}");    
    
        return 0;
    }

产生结果：

    No IFS input: 1 2:3 4:5
    output:
    1
    2:3
    4:5
    IFS=':' input: 1 2:3 4:5
    output:
    1
    2:3
    4:5
    FAKE input: ${FAKE}
    output:
    1 2
    3 4
    5

Answer 1

让我们天真地假设 POSIX 是可以理解的并尝试使用它。我们取 wordexp() from posix:

The words argument is a pointer to a string containing one or more words to be expanded. The expansions shall be the same as would be performed by the command line interpreter if words were the part of a command line representing the arguments to a utility. [...]

那么让我们转到“命令行解释器”。来自 posix shell command language:

2.1 Shell Introduction

[...]

The shell breaks the input into tokens: words and operators; see Token Recognition. [.......]

2.3 Token Recognition

[...]

If the current character is an unquoted <blank>, any token containing the previous character is delimited and the current character shall be discarded.

If the previous character was part of a word, the current character shall be appended to that word.

[...]

基本上整个 2.3 Token Recognition 部分都适用于此 - 这是 wordexp() 所做的事情 - 标记识别加上一些扩展。还有关于 field splitting 最重要的东西，重点是我的：

After parameter expansion (Parameter Expansion), command substitution (Command Substitution), and arithmetic expansion (Arithmetic Expansion), the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result.

IFS 影响字段拆分，它影响 其他扩展 的结果如何吐入单词。 IFS 不影响字符串如何拆分为标记，它仍然使用 <blank> - 制表符或 space 拆分。所以你看到的行为。

换句话说，当您在终端中键入 IFS=: 时，您不会像 echo:Hello:World 那样开始按 IFS 分隔标记，但仍会继续分隔部分命令使用 spaces.

无论如何，手册页是正确的...:p

Am I missing something and there is a way to get wordexp to split on characters other than whitespace?

没有。如果您想在单词中使用 space，请像在 shell 中一样引用参数。 "a b" "c d" "e".

If the split is only on whitespace, is this a bug in the

None:p

Answer 2

您是在将苹果与橙子进行比较。 wordexp() 以与 shell 相同的方式将字符串拆分为单个标记。 shell 内置 read 不遵循相同的算法；它只是进行分词。您应该将 wordexp() 与脚本或 shell 函数的参数的解析方式进行比较：

#!/bin/sh

printwords() {
    for arg in "$@"; do
        printf "%s\n" "$arg"
    done
}

echo "No IFS input: 1 2:3 4:5"
printwords 1 2:3 4:5
echo "IFS=':' input: 1 2:3 4:5"
IFS=:
printwords 1 2:3 4:5

这会产生

No IFS input: 1 2:3 4:5
1
2:3
4:5
IFS=':' input: 1 2:3 4:5
1
2:3
4:5

就像C程序一样。

现在，有趣的一点。我无法通过快速扫描在 POSIX 文档中找到明确提及的内容，但是 bash manual 对分词有这样的说法：

Note that if no expansion occurs, no splitting is performed.

让我们尝试一个在其参数中进行参数扩展的版本：

#!/bin/sh

printwords() {
    for arg in "$@"; do
        printf "%s\n" "$arg"
    done
}

foo=2:3
printf "foo = %s\n" "$foo"
printf "No IFS input: 1 $foo 4:5\n"
printwords 1 $foo 4:5
printf "IFS=':' input: 1 $foo 4:5\n"
IFS=:
printwords 1 $foo 4:5

当运行通过 shell 时 dash、ksh93 或 bash（但不是 zsh 除非你打开 SH_WORD_SPLIT 选项），产生

foo = 2:3
No IFS input: 1 $foo 4:5
1
2:3
4:5
IFS=':' input: 1 $foo 4:5
1
2
3
4:5

如您所见，具有参数的参数受字段拆分影响，但不是字面值。对 C 程序中的字符串进行相同的更改，运行ning foo=2:3 ./wordexp 打印出相同的内容。

Linux gcc wordexp 不使用 IFS 环境变量 Posix 用于拆分单词的 C 库函数

Linux IFS environment variable is not used by gcc wordexp Posix C library function for splitting words

c

linux

gcc

posix

ifs

环境

问题

bash 测试

C代码测试

源代码审查

问题

答案摘要和解决方法