在 C++ 中哪个更低？

Question

给定 string foo，我写了 answers on how to use cctype's tolower 将字符转换为小写

transform(cbegin(foo), cend(foo), begin(foo), static_cast<int (*)(int)>(tolower))

但我有 locale's tolower，可以这样使用：

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), foo.size()));

是否有理由偏爱其中之一？
它们的功能有什么不同吗？
我的意思是除了 tolower 接受和 returns 一个 int 我认为只是一些过时的 C 东西？

Answer 1

在第一种情况下 (cctype) 语言环境是隐式设置的：

Converts the given character to lowercase according to the character conversion rules defined by the currently installed C locale.

http://en.cppreference.com/w/cpp/string/byte/tolower

在第二种（语言环境的）情况下，您必须明确设置语言环境：

Converts parameter c to its lowercase equivalent if c is an uppercase letter and has a lowercase equivalent, as determined by the ctype facet of locale loc. If no such conversion is possible, the value returned is c unchanged.

http://www.cplusplus.com/reference/locale/tolower/

Answer 2

不幸的是，两者都同样糟糕。尽管 std::string 假装是一个 utf-8 编码的字符串，但 methods/function（包括 tolower）中的任何一个都不是真正的 utf-8 编码的。因此，tolower / tolower + locale 可能适用于单字节 (= ASCII) 的字符，它们将无法用于所有其他语言集。

在 Linux，我会使用 ICU 库。在 Windows 上，我会使用 CharUpper 函数。

Answer 3

应该注意的是，当 locale 的 tolower 被创建。它在两个主要方面得到改进：

正如在中提到的，locale 版本允许使用 facet ctype，即使是用户修改了一个，无需要求通过 setlocale 改组新的 LC_CTYPE 并恢复之前的 LC_CTYPE
来自第 7.1.6.2 节[dcl.type.simple]3:

It is implementation-defined whether objects of char type are represented as signed or unsigned quantities. The signed specifier forces char objects to be signed

tolower 的 cctype 版本可能会产生未定义的行为，如果它是参数：

Is not representable as unsigned char and does not equal EOF

因此 tolower 的 cctype 版本需要额外的输入和输出 static_cast 产生：

transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });

由于 locale 版本直接在 char 上运行，因此不需要类型转换。

因此，如果您不需要在不同的 facet ctype 中执行转换，那么它就变成了一个风格问题，即您是否更喜欢 transform 所需的 lambda cctype 版本，或者您是否更喜欢 locale 版本的：

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));

在 C++ 中哪个更低？

Which tolower in C++?

c++

string

locale

ctype

tolower