使用 Term::ReadLine 和 Unicode 输入

Using Term::ReadLine with Unicode input

我正在尝试弄清楚如何使用 Term::ReadLine. It turns out, if I enter a Unicode character at the prompt, the returned string varies depending on various settings. (I am running Ubuntu 14.10, and have installed Term::ReadLine::Gnu) 从终端读取 Unicode 输入。例如 (p.pl):

use open qw( :std :utf8 );
use strict;
use warnings;

use Devel::Peek;
use Term::ReadLine;

my $term   = Term::ReadLine->new('ProgramName');
$term->ornaments( 0 );
my $ans = $term->readline("Enter message: ");
Dump ( $ans );

运行 p.pl 并在提示符下输入 å 会得到输出:

Enter message: å
SV = PV(0x83a5a0) at 0x87c080
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0x917500 "35"[=12=]
  CUR = 2
  LEN = 10

所以返回的字符串$ans没有设置UTF-8标志。但是,如果我 运行 程序使用 perl -CS p.pl,输出是:

Enter message: å
SV = PVMG(0x24c12e0) at 0x23050a0
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x248faf0 "35"[=13=] [UTF8 "\x{e5}"]
  CUR = 2
  LEN = 10

UTF-8 标志已在 $ans 上正确设置。所以第一个问题是:为什么命令行选项 -CS 不同于使用 pragma use open qw( :std :utf8 )?

接下来,我用 -CS 选项测试了 Term::ReadLine::Stub

$ PERL_RL=Stub perl -CS p.pl

现在的输出是:

Enter message: å
SV = PV(0xf97260) at 0xfd90c8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x10746e0 "3325"[=15=] [UTF8 "\x{c3}\x{a5}"]
  CUR = 4
  LEN = 10

并且输出字符串 $ans 已被双重编码,因此输出已损坏。这是一个错误,还是预期的行为?

Term::ReadLine 不读取 STDIN,它 opens new 文件句柄。所以 use open qw(:std :utf8); 没有效果。

你需要做这样的事情:

my $term = Term::ReadLine->new('name');
binmode($term->IN, ':utf8');

关于-CS的更新:

选项 -C 为魔术变量 ${^UNICODE} 设置了一些值。 -CS(或 -CI)选项使表达式 ${^UNICODE} & 0x0001 为真。如果 ${^UNICODE} & 0x0001 为真,则输入字符串 Term::ReadLine sets UTF-8 flag on

注意,选项 -CSbinmode($term->IN, ':utf8') 不同。其中第一个仅设置 UTF-8 标志,第二个编码字符串。

正如 Denis Ibaev 在他的 , the problem is that Term::ReadLine does not read STDIN, it opens a new input filehandle. As an alternative to calling binmode($term->IN, ':utf8'), it turns out one can make either of command line option -CS or use open qw( :std :utf8) work out of the box with Term::ReadLine by supplying STDIN as an argument to Term::ReadLine->new(), as explained in the answer to this question: Term::Readline: encoding-question 中所解释的那样。

例如:

use strict;
use utf8;
use open qw( :std :utf8 );
use warnings;
use Term::ReadLine;

my $term   = Term::ReadLine->new('Test', \*STDIN, \*STDOUT);
my $answer = $term->readline( 'Enter input: ' );