将 `int_least8_t` 转换为 `char` 时如何发出警告?

How to have warning when casting `int_least8_t` to `char`?

我正在构建一个同时支持 ascii 和 utf8 的字符串库。
我为 t_asciit_utf8 创建了两个 typedef。 ascii 读作 utf8 是安全的,但 utf8 读作 ascii 是不安全的。
当从 t_utf8 隐式转换为 t_ascii 时,我有什么办法发出警告,但当隐式转换 t_asciit_utf8 时,我没有办法发出警告吗?

理想情况下,我希望发出这些警告(并且仅发出这些警告):

#include <stdint.h>

typedef char           t_ascii;
typedef uint_least8_t  t_utf8;

int main()
{
    t_ascii const* asciistr = "Hello world"; // Ok
    t_utf8 const*   utf8str = "你好世界";    // Ok

    asciistr = utf8str; // Warning: utf8 to ascii is not safe
    utf8str = asciistr; // Ok: ascii to utf8 is safe

    t_ascii asciichar = 'A';
    t_utf8   utf8char = 'B';

    asciichar = utf8char; // Warning: utf8 to ascii is not safe
    utf8char = asciichar; // Ok: ascii to utf8 is safe
}

目前,在使用 -Wall(甚至使用 -funsigned-char)构建时,我收到以下警告:

gcc main.c -Wall -Wextra                          
main.c: In function ‘main’:
main.c:10:35: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const unsigned char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
   10 |         t_utf8 const*   utf8str = "你好世界";    // Ok
      |                                   ^~~~~~~~~~
main.c:12:18: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const unsigned char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |         asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |                  ^
main.c:16:17: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const unsigned char *’} differ in signedness [-Wpointer-sign]
   16 |         utf8str = asciistr; // Ok: ascii to utf8 is safe
      |                 ^

-Wall 编译。始终使用 -Wall.

进行编译
<user>@squall:~/src/p1$ gcc -Wall -c test2.c
test2.c: In function ‘main’:
test2.c:9:31: warning: pointer targets in initialization of ‘const t_utf8 *’ {aka ‘const signed char *’} from ‘char *’ differ in signedness [-Wpointer-sign]
    9 |     t_utf8  const*  utf8str = "你好世界";
      |                               ^~~~~~~~~~~~~~
test2.c:11:13: warning: pointer targets in assignment from ‘const t_ascii *’ {aka ‘const char *’} to ‘const t_utf8 *’ {aka ‘const signed char *’} differ in signedness [-Wpointer-sign]
   11 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^
test2.c:12:14: warning: pointer targets in assignment from ‘const t_utf8 *’ {aka ‘const signed char *’} to ‘const t_ascii *’ {aka ‘const char *’} differ in signedness [-Wpointer-sign]
   12 |     asciistr = utf8str; // Should issue warning: utf8 to ascii is not safe
      |              ^

您希望从 t_asciit_utf8 投射是安全的,但事实并非如此。签名不同。

警告与有效的 utf8 有时不是有效的 ASCII 这一事实无关——编译器对此一无所知。警告是关于标志的。

如果你想要一个无符号的 char,用 -funsigned-char 编译。但随后不会发出任何警告。

(顺便说一句,如果您认为类型 int_least8_t 能够保存多字节字符/完整的 utf8 代码点编码 - 它不会。所有 int_least8_t 和因此 utf8_t 在单个编译单元中将具有完全相同的大小。)

用标准的C编译器编译即可。 What compiler options are recommended for beginners learning C?

结果:

<source>: In function 'main':
<source>:9:31: error: pointer targets in initialization of 'const t_utf8 *' {aka 'const unsigned char *'} from 'char *' differ in signedness [-Wpointer-sign]
    9 |     t_utf8 const*   utf8str = "你好世界";    // Ok
      |                               ^~~~~~~~~~
<source>:11:14: error: pointer targets in assignment from 'const t_utf8 *' {aka 'const unsigned char *'} to 'const t_ascii *' {aka 'const char *'} differ in signedness [-Wpointer-sign]
   11 |     asciistr = utf8str; // Warning: utf8 to ascii is not safe
      |              ^
<source>:12:13: error: pointer targets in assignment from 'const t_ascii *' {aka 'const char *'} to 'const t_utf8 *' {aka 'const unsigned char *'} differ in signedness [-Wpointer-sign]
   12 |     utf8str = asciistr; // Ok: ascii to utf8 is safe
      |             ^

but not when implicitely casting t_ascii to t_utf8 ?

不,你不能在标准 C 中使用它,因为它是一个无效的指针转换。您可以使用显式强制转换使编译器静音,但如果这样做,您将调用未定义的行为。


除此之外,您可以使用 C11 _Generic 找出哪种类型 uint_least8_t 归结为:

#include <stdint.h>
#include <stdio.h>

#define what_type(obj) printf("%s is same as %s\n", #obj, \
  _Generic ((obj),                                        \
            char: "char",                                 \
            unsigned char: "unsigned char",               \
            signed char: "signed char") );
  

int main (void)
{
    typedef char           t_ascii;
    typedef uint_least8_t  t_utf8;

    t_ascii ascii;
    t_utf8  utf8;

    what_type(ascii);
    what_type(utf8);
}

gcc x86 上的输出 Linux:

ascii is same as char
utf8 is same as unsigned char