gcc 无法正确读取以 utf-16 编码的 c 文件

Question

做一些编码测试，我保存了一个编码为 'UTF-16 LE' 的 c 文件（使用 sublimeText）。

c 文件包含以下内容：

#include <stdio.h>

void main() {
    char* letter = "é";
    printf("%s\n", letter);
}

用 gcc 编译这个文件 returns 错误：

test.c:1:3: error: invalid preprocessing directive #i; did you mean #if?
    1 | # i n c l u d e   < s t d i o . h >

就好像gcc在读取c文件时在每个字符前插入了一个space。

我的问题是：我们可以提交以“utf-8”以外的某种格式编码的 c 文件吗？为什么 gcc 无法检测我的文件的编码并正确读取它？

Answer 1

因为设计选择。

来自GNU Manual, Character-sets：

At present, GNU CPP does not implement conversion from arbitrary file encodings to the source character set. Use of any encoding other than plain ASCII or UTF-8, except in comments, will cause errors. Use of encodings that are not strict supersets of ASCII, such as Shift JIS, may cause errors even if non-ASCII characters appear only in comments. We plan to fix this in the near future.

GCC 是为创建 GNU 而生的，所以从 Unix 世界来看，UTF16 是不允许的字符集（对于标准文件，GNU 在不同程序之间传递源文件，例如 CPP 预处理器，GCC 编译器等。 ).

而且，谁使用 UTF16 作为来源？对于 C，它讨厌字符串中的所有 \0 吗？源代码编码与程序无关（读取文件、打印字符串等做默认语言环境）

如果它导致问题，只需使用预处理器（这并不罕见），以 gcc 可用代码更改您的源代码（但对您隐藏，因此您可以继续在 UTF16 中编辑）。

gcc 无法正确读取以 utf-16 编码的 c 文件

c-file encoded in utf-16 is not read properly by gcc

gcc

utf-8

utf-16

character-encoding