如何在perl中打开包含CJK字符的文件
How to open a file contains CJK characters in perl
如何使用 perl 脚本打开包含 CJK
个字符的文件:
use utf8;
use open ':encoding(utf8)';
binmode STDOUT, ':utf8';
上面的代码我用来打开文件,我找不到 CJK
个字符并且它是空的。输入文件包含文本:
\para[para]Details of the electronic structure of the metal the loss to
electron-hole-pair excitations \characters{刘安雯} only
depends weakly on the metal. The metals exhibit large variation in the
work function, yet the translational in elasticity is similar in all
cases. This suggests electron transfer forming a transient
H\textsuperscript{{\textminus}} is not important. The simulation allows
us to construct \characters{胡水明} a universal sticking
function for H and D on metals, which depends only on the H atom
incidence translational energy and incidence angle as well as the mass of
the solid's atoms.\endp
我是这样找到的:
while($str=m/(\p{InCJK_Unified_Ideographs})/xg)
{
print "Char: --> $&\n";
}
有人可以指导我的代码哪里做错了吗:谢谢。
更新:
I don't know but this program works fine and printing the CJK
characters
use utf8;
my $str = "\characters{刘安雯胡水明}";
while($str=~m/(\p{InCJK_Unified_Ideographs}){1,}/xg) { print ":: $&\n"; }
use utf8;
这一行告诉 Perl 源代码包含 UTF-8,因此它与从文件读取无关。
use open ':encoding(utf8)';
这相当于
use open IO => 'encoding(utf8)';
设置输入和输出的编码streams,即它不会改变标准输入和输出的编码。为此,您需要添加 :std
:
use open IO => ':utf8', ':std';
显示的最后一行,
binmode STDOUT, ':utf8';
设置 STDOUT 的编码,如果它使用 :std
.
,则它已经被上一行覆盖了
你没有说明你是如何打开文件的。如果您使用 <>
或 readline 而不指定文件句柄,则需要为标准输入设置编码,如上所示。如果您使用文件句柄,我就没主意了 - 它对我有用。
如何使用 perl 脚本打开包含 CJK
个字符的文件:
use utf8;
use open ':encoding(utf8)';
binmode STDOUT, ':utf8';
上面的代码我用来打开文件,我找不到 CJK
个字符并且它是空的。输入文件包含文本:
\para[para]Details of the electronic structure of the metal the loss to
electron-hole-pair excitations \characters{刘安雯} only
depends weakly on the metal. The metals exhibit large variation in the
work function, yet the translational in elasticity is similar in all
cases. This suggests electron transfer forming a transient
H\textsuperscript{{\textminus}} is not important. The simulation allows
us to construct \characters{胡水明} a universal sticking
function for H and D on metals, which depends only on the H atom
incidence translational energy and incidence angle as well as the mass of
the solid's atoms.\endp
我是这样找到的:
while($str=m/(\p{InCJK_Unified_Ideographs})/xg)
{
print "Char: --> $&\n";
}
有人可以指导我的代码哪里做错了吗:谢谢。
更新:
I don't know but this program works fine and printing the
CJK
characters
use utf8;
my $str = "\characters{刘安雯胡水明}";
while($str=~m/(\p{InCJK_Unified_Ideographs}){1,}/xg) { print ":: $&\n"; }
use utf8;
这一行告诉 Perl 源代码包含 UTF-8,因此它与从文件读取无关。
use open ':encoding(utf8)';
这相当于
use open IO => 'encoding(utf8)';
设置输入和输出的编码streams,即它不会改变标准输入和输出的编码。为此,您需要添加 :std
:
use open IO => ':utf8', ':std';
显示的最后一行,
binmode STDOUT, ':utf8';
设置 STDOUT 的编码,如果它使用 :std
.
你没有说明你是如何打开文件的。如果您使用 <>
或 readline 而不指定文件句柄,则需要为标准输入设置编码,如上所示。如果您使用文件句柄,我就没主意了 - 它对我有用。