基准测试 utf8 文件读取 - 差异说明
Benchmarking utf8 file read - explanation of the differences
有这个代码:
#!/usr/bin/env perl
use 5.016;
use warnings;
use autodie;
use Path::Tiny;
use Encode;
use Benchmark qw(:all);
my $cnt = 10_000;
my $utf = 'utf8.txt';
my $res = timethese($cnt, {
'open-UTF-8' => sub {
open my $fhu, '<:encoding(UTF-8)', $utf;
my $stru = do { local $/; <$fhu>};
close $fhu;
},
'open-utf8' => sub {
open my $fhu, '<:utf8', $utf;
my $stru = do { local $/; <$fhu>};
close $fhu;
},
'decode-utf8' => sub {
open my $fhu, '<', $utf;
my $stru = decode('utf8', do { local $/; <$fhu>});
close $fhu;
},
'decode-UTF-8' => sub {
open my $fhu, '<', $utf;
my $stru = decode('UTF-8', do { local $/; <$fhu>});
close $fhu;
},
'ptiny' => sub {
my $stru = path($utf)->slurp_utf8;
},
});
cmpthese $res;
utf8.txt
(大约 175kb)包含 1000 行 utf8 encoded/ascii 字符,例如:
9áäčďéěíĺľňóôöőŕřšťúůüűýž ÁÄČĎÉĚÍĹĽŇÓÔÖŐŔŘŠŤÚŮÜŰÝŽ aáäbcčdďeéěfghiíjkľĺmnňoóôöőpqrŕřsštťuúůüűvwxyýzž
运行 以上,在我的笔记本上给出:
Benchmark: timing 10000 iterations of decode-UTF-8, decode-utf8, open-UTF-8, open-utf8, ptiny...
decode-UTF-8: 47 wallclock secs (46.83 usr + 0.87 sys = 47.70 CPU) @ 209.64/s (n=10000)
decode-utf8: 48 wallclock secs (46.62 usr + 0.90 sys = 47.52 CPU) @ 210.44/s (n=10000)
open-UTF-8: 60 wallclock secs (57.82 usr + 1.20 sys = 59.02 CPU) @ 169.43/s (n=10000)
open-utf8: 7 wallclock secs ( 6.57 usr + 0.70 sys = 7.27 CPU) @ 1375.52/s (n=10000)
ptiny: 7 wallclock secs ( 5.98 usr + 0.52 sys = 6.50 CPU) @ 1538.46/s (n=10000)
Rate open-UTF-8 decode-UTF-8 decode-utf8 open-utf8 ptiny
open-UTF-8 169/s -- -19% -19% -88% -89%
decode-UTF-8 210/s 24% -- -0% -85% -86%
decode-utf8 210/s 24% 0% -- -85% -86%
open-utf8 1376/s 712% 556% 554% -- -11%
ptiny 1538/s 808% 634% 631% 12% --
对我来说很意外,所以问题:
- 首先 - 上面的代码有问题吗?
如果没问题,
- 为什么显式
UTF-8
和宽松 utf8
之间存在巨大差异,但仅在 IO 层级别(<:utf8
和 <:encoding(UTF-8)
?所以,
- 为什么
decode('UTF-8'
和 decode('utf8'
差别不大?
- 为什么惰性 - IO 层级解码比显式惰性解码快得多
decode('utf8
?
以及 "danger" 可以使用轻松(快速)“utf8”与精确(慢速)'UTF-8'?
最后,不是真正的问题 - 我必须检查 Path::Tiny 代码 - 它是如何最快的...
环境:
- perl v5.22.0 - perlbrew(线程)
- OSX - 达尔文内核版本 14.4.0:(yosemite)
- 旧笔记本 - MacBook Pro(13 英寸,2010 年中)- core-2-duo,2.4Ghz,8GB,慢速 HDD
:utf8
PerlIO :utf8
层是一个伪层,它只是 OP 检测到的 PerlIO 句柄上的一个标志。行为因使用的 OP 而异:
read()、sysread() 和 recv():
utf8序列的implementation performs no validation of the utf8 sequences. The implementation only checks the prefix octet统计读取utf8序列的次数
readline():
implementation validates the read octets if the warnings category 'utf8'
is in effect and issues a warning if the read octets contains ill-formed utf8. The used validation procedure is the same as used in utf8::decode()
.
除非您愿意接受可能导致 security issues 或分段错误的格式错误的 UTF-X,否则永远不要将“:utf8”flag/layer 用于阅读。
:编码
PerlIO :encoding
层由 PerlIO::encoding which implements an incremental decoder framework for subclasses of Encode::Encoding. The implementation 通过为每个增量解码调用一个方法调用 Perl/XS 子类提供。缓冲区在图层和子类之间复制。
utf8 与 UTF-8
utf8编码形式是Unicode Consortium. The utf8 encoding form accepts encoded code points that are ill-formed in the UTF-8 encoding form, such as surrogates and code points above U+10FFFF. Non-characters should also be avoided, even though Unicode recently changed他们心目中指定的UTF-8编码形式的超集。 utf8 编码不应该用于交换,它是 Perl 的内部编码。请改用 UTF-8 编码形式。
slurping UTF-8 编码文件的基准测试
基准测试中使用的模块:
PerlIO::encoding, PerlIO::utf8_strict, Encode and Unicode::UTF8.
以下代码也可用于 gist.github.com。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw[];
use Config qw[%Config];
use IO::Dir qw[];
use IO::File qw[SEEK_SET];
use Encode qw[];
use Unicode::UTF8 qw[];
use PerlIO::encoding qw[];
use PerlIO::utf8_strict qw[];
# https://github.com/chansen/p5-unicode-utf8/tree/master/benchmarks/data
my $dir = 'benchmarks/data';
my @docs = do {
my $d = IO::Dir->new($dir)
or die qq/Could not open directory '$dir': $!/;
sort grep { /^[a-z]{2}\.txt/ } $d->read;
};
printf "perl: %s (%s %s)\n", $], @Config{qw[osname osvers]};
printf "Encode: %s\n", Encode->VERSION;
printf "Unicode::UTF8: %s\n", Unicode::UTF8->VERSION;
printf "PerlIO::encoding: %s\n", PerlIO::encoding->VERSION;
printf "PerlIO::utf8_strict: %s\n", PerlIO::utf8_strict->VERSION;
foreach my $doc (@docs) {
my $octets = do {
open my $fh, '<:raw', "$dir/$doc" or die $!;
local $/; <$fh>;
};
my $string = Unicode::UTF8::decode_utf8($octets);
my @ranges = (
[ 0x00, 0x7F, qr/[\x{00}-\x{7F}]/ ],
[ 0x80, 0x7FF, qr/[\x{80}-\x{7FF}]/ ],
[ 0x800, 0xFFFF, qr/[\x{800}-\x{FFFF}]/ ],
[ 0x10000, 0x10FFFF, qr/[\x{10000}-\x{10FFFF}]/ ],
);
my @out;
foreach my $r (@ranges) {
my ($start, $end, $regexp) = @$r;
my $count = () = $string =~ m/$regexp/g;
push @out, sprintf "U+%.4X..U+%.4X: %d", $start, $end, $count
if $count;
}
printf "\n\n%s: Size: %d Code points: %d (%s)\n",
$doc, length $octets, length $string, join ' ', @out;
open my $fh_raw, '<:raw', $octets
or die qq/Could not open a :raw fh: '$!'/;
open my $fh_encoding, '<:encoding(UTF-8)', $octets
or die qq/Could not open a :encoding fh: '$!'/;
open my $fh_utf8_strict, '<:utf8_strict', $octets
or die qq/Could not open a :utf8_strict fh: '$!'/;
Benchmark::cmpthese( -10, {
':encoding(UTF-8)' => sub {
my $data = do { local $/; <$fh_encoding> };
seek($fh_encoding, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
':utf8_strict' => sub {
my $data = do { local $/; <$fh_utf8_strict> };
seek($fh_utf8_strict, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
'Encode' => sub {
my $data = Encode::decode('UTF-8', do { local $/; scalar <$fh_raw> }, Encode::FB_CROAK|Encode::LEAVE_SRC);
seek($fh_raw, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
'Unicode::UTF8' => sub {
my $data = Unicode::UTF8::decode_utf8(do { local $/; scalar <$fh_raw> });
seek($fh_raw, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
});
}
结果:
$ perl benchmarks/slurp.pl
perl: 5.023001 (darwin 14.4.0)
Encode: 2.75
Unicode::UTF8: 0.60
PerlIO::encoding: 0.21
PerlIO::utf8_strict: 0.006
ar.txt: Size: 25918 Code points: 14308 (U+0000..U+007F: 2698 U+0080..U+07FF: 11610)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 3058/s -- -19% -73% -87%
Encode 3754/s 23% -- -67% -84%
:utf8_strict 11361/s 272% 203% -- -52%
Unicode::UTF8 23620/s 672% 529% 108% --
el.txt: Size: 103974 Code points: 58748 (U+0000..U+007F: 13560 U+0080..U+07FF: 45150 U+0800..U+FFFF: 38)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 780/s -- -19% -73% -86%
Encode 958/s 23% -- -66% -83%
:utf8_strict 2855/s 266% 198% -- -48%
Unicode::UTF8 5498/s 605% 474% 93% --
en.txt: Size: 82171 Code points: 82055 (U+0000..U+007F: 81988 U+0080..U+07FF: 18 U+0800..U+FFFF: 49)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 1111/s -- -16% -90% -96%
Encode 1327/s 19% -- -88% -95%
:utf8_strict 11446/s 931% 763% -- -60%
Unicode::UTF8 28635/s 2478% 2058% 150% --
ja.txt: Size: 180109 Code points: 64655 (U+0000..U+007F: 6913 U+0080..U+07FF: 30 U+0800..U+FFFF: 57712)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 553/s -- -27% -72% -91%
Encode 757/s 37% -- -61% -87%
:utf8_strict 1960/s 254% 159% -- -67%
Unicode::UTF8 5915/s 970% 682% 202% --
lv.txt: Size: 138397 Code points: 127160 (U+0000..U+007F: 117031 U+0080..U+07FF: 9021 U+0800..U+FFFF: 1108)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 605/s -- -19% -80% -91%
Encode 746/s 23% -- -75% -88%
:utf8_strict 3043/s 403% 308% -- -53%
Unicode::UTF8 6453/s 967% 765% 112% --
ru.txt: Size: 151633 Code points: 85266 (U+0000..U+007F: 19263 U+0080..U+07FF: 65639 U+0800..U+FFFF: 364)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 542/s -- -19% -73% -86%
Encode 673/s 24% -- -66% -83%
:utf8_strict 2001/s 269% 197% -- -50%
Unicode::UTF8 4010/s 640% 496% 100% --
sv.txt: Size: 96449 Code points: 92894 (U+0000..U+007F: 89510 U+0080..U+07FF: 3213 U+0800..U+FFFF: 171)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 923/s -- -17% -85% -93%
Encode 1109/s 20% -- -82% -92%
:utf8_strict 5998/s 550% 441% -- -56%
Unicode::UTF8 13604/s 1374% 1127% 127% --
zh.txt: Size: 62891 Code points: 24519 (U+0000..U+007F: 5317 U+0080..U+07FF: 32 U+0800..U+FFFF: 19170)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 1630/s -- -23% -75% -87%
Encode 2104/s 29% -- -68% -83%
:utf8_strict 6549/s 302% 211% -- -48%
Unicode::UTF8 12630/s 675% 500% 93% --
有这个代码:
#!/usr/bin/env perl
use 5.016;
use warnings;
use autodie;
use Path::Tiny;
use Encode;
use Benchmark qw(:all);
my $cnt = 10_000;
my $utf = 'utf8.txt';
my $res = timethese($cnt, {
'open-UTF-8' => sub {
open my $fhu, '<:encoding(UTF-8)', $utf;
my $stru = do { local $/; <$fhu>};
close $fhu;
},
'open-utf8' => sub {
open my $fhu, '<:utf8', $utf;
my $stru = do { local $/; <$fhu>};
close $fhu;
},
'decode-utf8' => sub {
open my $fhu, '<', $utf;
my $stru = decode('utf8', do { local $/; <$fhu>});
close $fhu;
},
'decode-UTF-8' => sub {
open my $fhu, '<', $utf;
my $stru = decode('UTF-8', do { local $/; <$fhu>});
close $fhu;
},
'ptiny' => sub {
my $stru = path($utf)->slurp_utf8;
},
});
cmpthese $res;
utf8.txt
(大约 175kb)包含 1000 行 utf8 encoded/ascii 字符,例如:
9áäčďéěíĺľňóôöőŕřšťúůüűýž ÁÄČĎÉĚÍĹĽŇÓÔÖŐŔŘŠŤÚŮÜŰÝŽ aáäbcčdďeéěfghiíjkľĺmnňoóôöőpqrŕřsštťuúůüűvwxyýzž
运行 以上,在我的笔记本上给出:
Benchmark: timing 10000 iterations of decode-UTF-8, decode-utf8, open-UTF-8, open-utf8, ptiny...
decode-UTF-8: 47 wallclock secs (46.83 usr + 0.87 sys = 47.70 CPU) @ 209.64/s (n=10000)
decode-utf8: 48 wallclock secs (46.62 usr + 0.90 sys = 47.52 CPU) @ 210.44/s (n=10000)
open-UTF-8: 60 wallclock secs (57.82 usr + 1.20 sys = 59.02 CPU) @ 169.43/s (n=10000)
open-utf8: 7 wallclock secs ( 6.57 usr + 0.70 sys = 7.27 CPU) @ 1375.52/s (n=10000)
ptiny: 7 wallclock secs ( 5.98 usr + 0.52 sys = 6.50 CPU) @ 1538.46/s (n=10000)
Rate open-UTF-8 decode-UTF-8 decode-utf8 open-utf8 ptiny
open-UTF-8 169/s -- -19% -19% -88% -89%
decode-UTF-8 210/s 24% -- -0% -85% -86%
decode-utf8 210/s 24% 0% -- -85% -86%
open-utf8 1376/s 712% 556% 554% -- -11%
ptiny 1538/s 808% 634% 631% 12% --
对我来说很意外,所以问题:
- 首先 - 上面的代码有问题吗?
如果没问题,
- 为什么显式
UTF-8
和宽松utf8
之间存在巨大差异,但仅在 IO 层级别(<:utf8
和<:encoding(UTF-8)
?所以, - 为什么
decode('UTF-8'
和decode('utf8'
差别不大? - 为什么惰性 - IO 层级解码比显式惰性解码快得多
decode('utf8
? 以及 "danger" 可以使用轻松(快速)“utf8”与精确(慢速)'UTF-8'?
最后,不是真正的问题 - 我必须检查 Path::Tiny 代码 - 它是如何最快的...
环境:
- perl v5.22.0 - perlbrew(线程)
- OSX - 达尔文内核版本 14.4.0:(yosemite)
- 旧笔记本 - MacBook Pro(13 英寸,2010 年中)- core-2-duo,2.4Ghz,8GB,慢速 HDD
:utf8
PerlIO :utf8
层是一个伪层,它只是 OP 检测到的 PerlIO 句柄上的一个标志。行为因使用的 OP 而异:
read()、sysread() 和 recv():
utf8序列的implementation performs no validation of the utf8 sequences. The implementation only checks the prefix octet统计读取utf8序列的次数
readline():
implementation validates the read octets if the warnings category 'utf8'
is in effect and issues a warning if the read octets contains ill-formed utf8. The used validation procedure is the same as used in utf8::decode()
.
除非您愿意接受可能导致 security issues 或分段错误的格式错误的 UTF-X,否则永远不要将“:utf8”flag/layer 用于阅读。
:编码
PerlIO :encoding
层由 PerlIO::encoding which implements an incremental decoder framework for subclasses of Encode::Encoding. The implementation 通过为每个增量解码调用一个方法调用 Perl/XS 子类提供。缓冲区在图层和子类之间复制。
utf8 与 UTF-8
utf8编码形式是Unicode Consortium. The utf8 encoding form accepts encoded code points that are ill-formed in the UTF-8 encoding form, such as surrogates and code points above U+10FFFF. Non-characters should also be avoided, even though Unicode recently changed他们心目中指定的UTF-8编码形式的超集。 utf8 编码不应该用于交换,它是 Perl 的内部编码。请改用 UTF-8 编码形式。
slurping UTF-8 编码文件的基准测试
基准测试中使用的模块:
PerlIO::encoding, PerlIO::utf8_strict, Encode and Unicode::UTF8.
以下代码也可用于 gist.github.com。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw[];
use Config qw[%Config];
use IO::Dir qw[];
use IO::File qw[SEEK_SET];
use Encode qw[];
use Unicode::UTF8 qw[];
use PerlIO::encoding qw[];
use PerlIO::utf8_strict qw[];
# https://github.com/chansen/p5-unicode-utf8/tree/master/benchmarks/data
my $dir = 'benchmarks/data';
my @docs = do {
my $d = IO::Dir->new($dir)
or die qq/Could not open directory '$dir': $!/;
sort grep { /^[a-z]{2}\.txt/ } $d->read;
};
printf "perl: %s (%s %s)\n", $], @Config{qw[osname osvers]};
printf "Encode: %s\n", Encode->VERSION;
printf "Unicode::UTF8: %s\n", Unicode::UTF8->VERSION;
printf "PerlIO::encoding: %s\n", PerlIO::encoding->VERSION;
printf "PerlIO::utf8_strict: %s\n", PerlIO::utf8_strict->VERSION;
foreach my $doc (@docs) {
my $octets = do {
open my $fh, '<:raw', "$dir/$doc" or die $!;
local $/; <$fh>;
};
my $string = Unicode::UTF8::decode_utf8($octets);
my @ranges = (
[ 0x00, 0x7F, qr/[\x{00}-\x{7F}]/ ],
[ 0x80, 0x7FF, qr/[\x{80}-\x{7FF}]/ ],
[ 0x800, 0xFFFF, qr/[\x{800}-\x{FFFF}]/ ],
[ 0x10000, 0x10FFFF, qr/[\x{10000}-\x{10FFFF}]/ ],
);
my @out;
foreach my $r (@ranges) {
my ($start, $end, $regexp) = @$r;
my $count = () = $string =~ m/$regexp/g;
push @out, sprintf "U+%.4X..U+%.4X: %d", $start, $end, $count
if $count;
}
printf "\n\n%s: Size: %d Code points: %d (%s)\n",
$doc, length $octets, length $string, join ' ', @out;
open my $fh_raw, '<:raw', $octets
or die qq/Could not open a :raw fh: '$!'/;
open my $fh_encoding, '<:encoding(UTF-8)', $octets
or die qq/Could not open a :encoding fh: '$!'/;
open my $fh_utf8_strict, '<:utf8_strict', $octets
or die qq/Could not open a :utf8_strict fh: '$!'/;
Benchmark::cmpthese( -10, {
':encoding(UTF-8)' => sub {
my $data = do { local $/; <$fh_encoding> };
seek($fh_encoding, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
':utf8_strict' => sub {
my $data = do { local $/; <$fh_utf8_strict> };
seek($fh_utf8_strict, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
'Encode' => sub {
my $data = Encode::decode('UTF-8', do { local $/; scalar <$fh_raw> }, Encode::FB_CROAK|Encode::LEAVE_SRC);
seek($fh_raw, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
'Unicode::UTF8' => sub {
my $data = Unicode::UTF8::decode_utf8(do { local $/; scalar <$fh_raw> });
seek($fh_raw, 0, SEEK_SET)
or die qq/Could not rewind fh: '$!'/;
},
});
}
结果:
$ perl benchmarks/slurp.pl
perl: 5.023001 (darwin 14.4.0)
Encode: 2.75
Unicode::UTF8: 0.60
PerlIO::encoding: 0.21
PerlIO::utf8_strict: 0.006
ar.txt: Size: 25918 Code points: 14308 (U+0000..U+007F: 2698 U+0080..U+07FF: 11610)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 3058/s -- -19% -73% -87%
Encode 3754/s 23% -- -67% -84%
:utf8_strict 11361/s 272% 203% -- -52%
Unicode::UTF8 23620/s 672% 529% 108% --
el.txt: Size: 103974 Code points: 58748 (U+0000..U+007F: 13560 U+0080..U+07FF: 45150 U+0800..U+FFFF: 38)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 780/s -- -19% -73% -86%
Encode 958/s 23% -- -66% -83%
:utf8_strict 2855/s 266% 198% -- -48%
Unicode::UTF8 5498/s 605% 474% 93% --
en.txt: Size: 82171 Code points: 82055 (U+0000..U+007F: 81988 U+0080..U+07FF: 18 U+0800..U+FFFF: 49)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 1111/s -- -16% -90% -96%
Encode 1327/s 19% -- -88% -95%
:utf8_strict 11446/s 931% 763% -- -60%
Unicode::UTF8 28635/s 2478% 2058% 150% --
ja.txt: Size: 180109 Code points: 64655 (U+0000..U+007F: 6913 U+0080..U+07FF: 30 U+0800..U+FFFF: 57712)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 553/s -- -27% -72% -91%
Encode 757/s 37% -- -61% -87%
:utf8_strict 1960/s 254% 159% -- -67%
Unicode::UTF8 5915/s 970% 682% 202% --
lv.txt: Size: 138397 Code points: 127160 (U+0000..U+007F: 117031 U+0080..U+07FF: 9021 U+0800..U+FFFF: 1108)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 605/s -- -19% -80% -91%
Encode 746/s 23% -- -75% -88%
:utf8_strict 3043/s 403% 308% -- -53%
Unicode::UTF8 6453/s 967% 765% 112% --
ru.txt: Size: 151633 Code points: 85266 (U+0000..U+007F: 19263 U+0080..U+07FF: 65639 U+0800..U+FFFF: 364)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 542/s -- -19% -73% -86%
Encode 673/s 24% -- -66% -83%
:utf8_strict 2001/s 269% 197% -- -50%
Unicode::UTF8 4010/s 640% 496% 100% --
sv.txt: Size: 96449 Code points: 92894 (U+0000..U+007F: 89510 U+0080..U+07FF: 3213 U+0800..U+FFFF: 171)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 923/s -- -17% -85% -93%
Encode 1109/s 20% -- -82% -92%
:utf8_strict 5998/s 550% 441% -- -56%
Unicode::UTF8 13604/s 1374% 1127% 127% --
zh.txt: Size: 62891 Code points: 24519 (U+0000..U+007F: 5317 U+0080..U+07FF: 32 U+0800..U+FFFF: 19170)
Rate :encoding(UTF-8) Encode :utf8_strict Unicode::UTF8
:encoding(UTF-8) 1630/s -- -23% -75% -87%
Encode 2104/s 29% -- -68% -83%
:utf8_strict 6549/s 302% 211% -- -48%
Unicode::UTF8 12630/s 675% 500% 93% --