为什么在使用 wprintf 时将 ©(版权符号)替换为 (C)?
Why is © (the copyright symbol) replaced with (C) when using wprintf?
当我尝试使用 printf
或 write
打印版权符号 ©
时,效果很好:
#include <stdio.h>
int main(void)
{
printf("©\n");
}
#include <unistd.h>
int main(void)
{
write(1, "©\n", 3);
}
输出:
©
但是当我尝试用 wprintf
打印它时,我得到 (C)
:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
wprintf(L"©\n");
}
输出:
(C)
当我添加对 setlocale
的调用时它已修复,但是:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "");
wprintf(L"©\n");
}
输出:
©
为什么会出现原始行为,为什么在我调用 setlocale
时它会修复?此外,这种转换发生在哪里?我怎样才能使 setlocale
之后的行为成为默认值?
编译命令:
gcc test.c
locale
:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
echo $LC_CTYPE
:
uname -a
:
Linux penguin 4.19.79-07511-ge32b3719f26b #1 SMP PREEMPT Mon Nov 18 17:41:41 PST 2019 x86_64 GNU/Linux
file test.c
(所有示例都相同):
test.c: C source, UTF-8 Unicode text
gcc --version
:
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
/lib/x86_64-linux-gnu/libc-2.24.so
(glibc
版本):
GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
cat /etc/debian_version
:
9.12
新进程不会自动继承调用进程的语言环境。
程序第一次启动时是C语言环境。 man page for setlocale(3)
表示如下:
On startup of the main program, the portable "C" locale is selected
as default. A program may be made portable to all locales by calling:
setlocale(LC_ALL, "");
...
The locale "C" or "POSIX" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.
因此,如输出所示,任何多字节/非 ASCII 字符都将转换为一个或多个 ASCII 字符。
区域设置如下:
setlocale(LC_ALL, "");
LC_ALL
标志指定更改所有与语言环境相关的变量。 locale 为空字符串表示根据相关环境变量设置locale。完成后,您应该会看到 shell 语言环境的字符。
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main()
{
char *before = setlocale(LC_ALL, NULL);
setlocale(LC_ALL, "");
char *after = setlocale(LC_ALL, NULL);
wprintf(L"before locale: %s\n", before);
wprintf(L"after locale: %s\n", after);
wprintf(L"©\n");
wprintf(L"\u00A9\n");
return 0;
}
输出:
before locale: C
after locale: en_US.utf8
©
©
当我尝试使用 printf
或 write
打印版权符号 ©
时,效果很好:
#include <stdio.h>
int main(void)
{
printf("©\n");
}
#include <unistd.h>
int main(void)
{
write(1, "©\n", 3);
}
输出:
©
但是当我尝试用 wprintf
打印它时,我得到 (C)
:
#include <stdio.h>
#include <wchar.h>
int main(void)
{
wprintf(L"©\n");
}
输出:
(C)
当我添加对 setlocale
的调用时它已修复,但是:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "");
wprintf(L"©\n");
}
输出:
©
为什么会出现原始行为,为什么在我调用 setlocale
时它会修复?此外,这种转换发生在哪里?我怎样才能使 setlocale
之后的行为成为默认值?
编译命令:
gcc test.c
locale
:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
echo $LC_CTYPE
:
uname -a
:
Linux penguin 4.19.79-07511-ge32b3719f26b #1 SMP PREEMPT Mon Nov 18 17:41:41 PST 2019 x86_64 GNU/Linux
file test.c
(所有示例都相同):
test.c: C source, UTF-8 Unicode text
gcc --version
:
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
/lib/x86_64-linux-gnu/libc-2.24.so
(glibc
版本):
GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
cat /etc/debian_version
:
9.12
新进程不会自动继承调用进程的语言环境。
程序第一次启动时是C语言环境。 man page for setlocale(3)
表示如下:
On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:
setlocale(LC_ALL, "");
...
The locale "C" or "POSIX" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.
因此,如输出所示,任何多字节/非 ASCII 字符都将转换为一个或多个 ASCII 字符。
区域设置如下:
setlocale(LC_ALL, "");
LC_ALL
标志指定更改所有与语言环境相关的变量。 locale 为空字符串表示根据相关环境变量设置locale。完成后,您应该会看到 shell 语言环境的字符。
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main()
{
char *before = setlocale(LC_ALL, NULL);
setlocale(LC_ALL, "");
char *after = setlocale(LC_ALL, NULL);
wprintf(L"before locale: %s\n", before);
wprintf(L"after locale: %s\n", after);
wprintf(L"©\n");
wprintf(L"\u00A9\n");
return 0;
}
输出:
before locale: C
after locale: en_US.utf8
©
©