Linux setxattr: 可以使用 Unicode 字符串吗？

Question

我在 VS Code 中编写了以下代码，运行它用于设置文件属性。好像是运行成功了，但是我查了一下值，里面的文字不对。文件扩展属性是否支持 Unicode 字符串？如果是这样，我该如何修复下面的代码？

#include <stdio.h>
#include <sys/xattr.h>

int main()
{
    printf("ねこ\n");
    ssize_t res = setxattr("/mnt/cat/test.txt", "user.dog"
    , "ねこ", 2, 0); /*also tested 4 and 8*/
    printf("Result = %lu\n", (unsigned long)res);
    return 0;    
}

节目输出

ねこ
Result = 0

读取属性

$ getfattr test.txt  -d
# file: test.txt
user.dog=0s44E=

Answer 1

显然ねこ不能存储在2个字节中。字符是 U+306D 和 U+3053，在 UTF-8 中编码为 E3 81 AD E3 81 93，因此长度必须设置为 6。如果你这样做，你会看到 getfattr test.txt -d 输出

user.dog=0s44Gt44GT

那是因为 -d 不知道数据的格式，只是将其转储为二进制文件。 0s 前缀表示数据采用 base64 格式，如 manpage:

中所述

-d, --dump

Dump the values of all matched extended attributes.

-e en, --encoding=en

Encode values after retrieving them. Valid values of en are "text", "hex", and "base64". Values encoded as text strings are enclosed in double quotes ("), while strings encoded as hexidecimal and base64 are prefixed with 0x and 0s, respectively.

只需将 44Gt44GT 插入任何 base64 解码器或运行 echo 44Gt44GT | base64 --decode，您就会看到打印出的正确字符串。要直接从 getfattr 查看字符串，您需要使用 -e text

指定格式

$ getfattr -n user.dog -e text test.txt
# file: test.txt
user.dog="ねこ"

Linux setxattr: 可以使用 Unicode 字符串吗？

Linux setxattr: possible to use Unicode string?

linux

unicode

file-attributes

xattr