Perl6 IO::Socket::Async 截断数据
Perl6 IO::Socket::Async truncates data
我正在使用 IO::Socket::Async 在 P6 中重写我的 P5 套接字服务器,但是接收到的数据在末尾被截断了 1 个字符,并且在下一个连接中收到了 1 个字符。来自 Perl6 Facebook 组的人 (Jonathan Worthington) 指出,这可能是由于字符串的性质和字节在 P6 中的处理方式非常不同。引用:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
我的 P6 在 MoarVM
上 运行
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^\s+|\s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^\d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
结果
Received --> $16AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $16AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
首先,TCP 连接是流,因此无法保证发送的 "messages" 将在接收端接收为等效的 "messages"。发送的内容可以作为正常 TCP 行为的一部分进行拆分或合并,甚至在考虑 Perl 6 行为之前。任何需要 "messages" 抽象的东西都需要在 TCP 流之上构建它(例如,通过将数据作为行发送,或者通过发送以字节为单位的大小,然后是数据)。
在 Perl 6 中,通过套接字到达的数据公开为 Supply
。 whenever $conn { }
是 whenever $conn.Supply { }
的缩写(whenever
会强制将给定的任何内容转换为 Supply
)。默认 Supply
是字符 1,解码为 UTF-8 为 Perl 6 Str
流。正如您已经收到的答复中所述,Perl 6 中的字符串在字素级别工作,因此它会保留一个字符,以防下一个通过网络到达的是组合字符。这就是您正在经历的 "truncation"。 (有些东西 永远 不能组合。例如, \n
永远不能在其上放置组合字符。这意味着面向行的协议不会遇到这种行为,可以简单地实现为 whenever $conn.Supply.lines { }
.)
有几个选项可用:
- 执行
whenever $conn.Supply(:bin) { }
,这将传送二进制 Blob
对象,这将对应于 OS 传递给 VM 的内容。然后可以根据需要 .decode
'。这可能是您最好的选择。
- 指定不支持组合字符的编码,例如
whenever $conn.Supply(:enc('latin-1')) { }
。 (但是,请注意,由于 \r\n
是 1 个字素,因此如果消息以 \r
结尾,那么它将被阻止以防下一个数据包与 \n
一起出现)。
在这两种情况下,消息在传输过程中仍有可能被拆分,但这些将(分别完全和大部分)避免字素规范化所需要的保持一个后退的要求。
我正在使用 IO::Socket::Async 在 P6 中重写我的 P5 套接字服务器,但是接收到的数据在末尾被截断了 1 个字符,并且在下一个连接中收到了 1 个字符。来自 Perl6 Facebook 组的人 (Jonathan Worthington) 指出,这可能是由于字符串的性质和字节在 P6 中的处理方式非常不同。引用:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
我的 P6 在 MoarVM
上 运行use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^\s+|\s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my @array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^\d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
结果
Received --> $16AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $16AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $08AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
首先,TCP 连接是流,因此无法保证发送的 "messages" 将在接收端接收为等效的 "messages"。发送的内容可以作为正常 TCP 行为的一部分进行拆分或合并,甚至在考虑 Perl 6 行为之前。任何需要 "messages" 抽象的东西都需要在 TCP 流之上构建它(例如,通过将数据作为行发送,或者通过发送以字节为单位的大小,然后是数据)。
在 Perl 6 中,通过套接字到达的数据公开为 Supply
。 whenever $conn { }
是 whenever $conn.Supply { }
的缩写(whenever
会强制将给定的任何内容转换为 Supply
)。默认 Supply
是字符 1,解码为 UTF-8 为 Perl 6 Str
流。正如您已经收到的答复中所述,Perl 6 中的字符串在字素级别工作,因此它会保留一个字符,以防下一个通过网络到达的是组合字符。这就是您正在经历的 "truncation"。 (有些东西 永远 不能组合。例如, \n
永远不能在其上放置组合字符。这意味着面向行的协议不会遇到这种行为,可以简单地实现为 whenever $conn.Supply.lines { }
.)
有几个选项可用:
- 执行
whenever $conn.Supply(:bin) { }
,这将传送二进制Blob
对象,这将对应于 OS 传递给 VM 的内容。然后可以根据需要.decode
'。这可能是您最好的选择。 - 指定不支持组合字符的编码,例如
whenever $conn.Supply(:enc('latin-1')) { }
。 (但是,请注意,由于\r\n
是 1 个字素,因此如果消息以\r
结尾,那么它将被阻止以防下一个数据包与\n
一起出现)。
在这两种情况下,消息在传输过程中仍有可能被拆分,但这些将(分别完全和大部分)避免字素规范化所需要的保持一个后退的要求。