通过 perl 单行将文本处理为 utf-16？

Question

perl 有一个选项 perl -C 来处理 utf-8，是否可以告诉 perl one-liner 输入是 utf-16 编码？ BEGIN 块可能用于显式更改编码，有没有更简单的方法？

Answer 1

可以Encode做你想做的事吗？然后，您可能必须在脚本中使用 encode() 和 decode()，因此它可能不短于：

    perl -nE 'BEGIN {binmode STDIN, ":encoding(utf16)" } ; ...'

有一个 PERL_UNICODE 环境变量，但它相当有限：如果我没记错的话，它只是模仿 -C。

我曾经试图找出为什么 "popular" 形式的 UTF 没有 -C 开关，这似乎归结为它们是否经常使用；是否被很好地理解（字节顺序有时很重要——谁知道呢？）；是——或者应该是——过时的； ... : 换句话说，它并不像看起来那么简单。

perl -MEncode -E 'say for Encode->encodings(":all")' 将显示 ~ 9 种不同的 UTF 编码。
除了通常的嫌疑人（perlrun, perlunitut, perlunicode, etc.), one of the most interesting perl resources on Unicode is right here on Whosebug 并且阅读起来很有趣。

c.f. @Leon Timmerman 的示例和 perldoc open 相当彻底：

% perl -Mopen=":std,:encoding(utf-16)" -E 'print <>' UTF16.txt > other.txt
% file other.txt 
other.txt: Big-endian UTF-16 Unicode text, with CRLF line terminators

编辑： 另一个最近的讨论也询问如何 touches on PerlIO and "layers" and has a neat solution that might lend itself to a one-liner. See UTF-16 perl input output。

我将尝试找到一个使用 Encode 的真实示例，以保留单行编码。它会像这样 "round trip"。例如:

% file UTF16.txt
UTF16.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

... 把它吸干并重定向到另一个文件：

% perl -00 -MEncode="encode,decode"  -E '
  $text = decode("UTF-16LE", <>) ;  
  print encode("UTF-16LE", $text)' UTF16.txt > other.txt
% file other.txt
other.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

diff 并以字节为单位打印文件大小：

% diff UTF16.txt other.txt
% perl -E 'say [stat]->[7] for @ARGV' UTF16.txt other.txt
2220
2220

Answer 2

您可以使用 perl -Mopen=":std,IN,:encoding(utf-16)" -e '...'

通过 perl 单行将文本处理为 utf-16？

Process text as utf-16 via perl one-liner?

unicode

perl