解包第一个字节指示长度的数据结构

unpacking a data structure whose first byte indicates length

我正在尝试从 IBM AFP 格式文件中解压 TLE(标记逻辑元素)。

规范 (http://www.afpcinc.org/wp-content/uploads/2017/12/MODCA-Reference-09.pdf) 表示这是两个三元组(尽管有四个值),其结构如下(具有 字节偏移量 ):

0:长度 | 1:Tid | 2-n:参数(=2:类型+3:格式+ 4-n: EBCDIC 编码字符串)

示例(有两个三元组,一个表示名称,一个表示值):

0C 02  0B  00   C3 A4 99 99 85 95 83 A8    07 36  00 00    C5 E4 D9
12 KEY UID CHAR  C  u  r  r  e  n  c  y     7 VAL RESERVED  E  U  R

我用Perl解析如下(并成功):

            if ($key eq 'Data') {
                my $tle = $member->{struct}->{$key};
                my $k_length = hex(unpack('H2', substr($tle, 0, 1)));
                my $key = decode('cp500', substr($tle, 4, $k_length - 4));
                my $v_length = hex(unpack('H2', substr($tle, $k_length, 1)));
                my $value = decode('cp500', substr($tle, $k_length + 4, $v_length - 4));
                print("'$key' => '$value'\n");
            }

结果:

'Currency' => 'EUR'

虽然上面是成功的,但我觉得我的方法有点太复杂了,还有更有效的方法来做到这一点。例如。 pack 模板是否支持读取第一个 n 字节以用作要解压的连续字节数的量词?我阅读了 Perl 包教程,但似乎无法找到相关内容。

如果长度字段不包含自身,您可以执行以下操作:

(my $record, $unparsed) = unpack("C/a a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));

但是长度字段包括它自己。

(my $length, $unparsed) = unpack("C a*", $unparsed);
(my $record, $unparsed) = unpack("a".($length-1)." a*", $unparsed);
my $key = decode("cp500", unpack("x3 a*", $record));

请查看以下演示代码是否满足您的要求。

这个代码

defines hash decoder subroutines

reads hex representation of bytes provided by OP from DATA block

converts read data into binary representation $data utilizing pack

extracts length, key/tid, type by utilizing unpack

call decoder subroutine for this particular type

gets back hash consisting two arrays keys and vals

forms new hash %data with provided keys and vals

outputs keys and values (returned keys are used to preserve byte/field order)

注意:Encode 'from_to' is utilized to decode EBCDIC -- alternative

use strict;
use warnings;
use feature 'say';

use utf8;
use Encode 'from_to';

my $debug = 1;

my %decoder = ( 
                1 => \&decode_type1,
                2 => \&decode_currency,
                3 => \&decode_type3,
                4 => \&decode_type4,
                5 => \&decode_type5
            );

my $bytes = read_bytes();
my($len,$key,$type) = unpack('C3',$bytes);

my $data = $decoder{$type}($bytes);

my %data;
@data{@{$data->{keys}}} = @{$data->{vals}};

say '
 Unpacked data
---------------';
printf "%-8s => %s\n", $_, $data{$_} for @{$data->{keys}};

sub read_bytes {
    my $hex_bytes = <DATA>;

    chomp $hex_bytes;

    my $bytes = pack('H*',$hex_bytes);

    return $bytes;
}

sub show_bytes {
    my $data = shift;

    print "Bytes: ";
    printf "%02X ", $_ for unpack 'C*', $data;
    print "\n";
}

sub decode_type1 {
    my $bytes = shift;

    return { keys => 'type1', vals => 'vals1' };
}

sub decode_currency {
    my $bytes = shift;

    show_bytes($bytes) if $debug;

    my @keys = qw/length_1 key uid char data_1 length_2 val reserved data_2/;
    my @vals = unpack('C4A8C2SA3',$bytes);

    from_to($vals[4], 'cp37', 'latin1');
    from_to($vals[8], 'cp37', 'latin1');
    
    return { keys => \@keys, vals => \@vals};
}

sub decode_type3 {
    my $bytes = shift;

    return { keys => 'type3', vals => 'vals3' };
}

sub decode_type4 {
    my $bytes = shift;

    return { keys => 'type4', vals => 'vals4' };
}

sub decode_type5 {
    my $bytes = shift;

    return { keys => 'type5', vals => 'vals5' };
}

__DATA__
0C020B00C3A49999859583A807360000C5E4D9

输出

Bytes: 0C 02 0B 00 C3 A4 99 99 85 95 83 A8 07 36 00 00 C5 E4 D9

 Unpacked data
---------------
length_1 => 12
key      => 2
uid      => 11
char     => 0
data_1   => Currency
length_2 => 7
val      => 54
reserved => 0
data_2   => EUR

注:

val 只占用一个字节,这看起来很可疑,它给出了欧元金额的 0..255 范围。也许 reserved 字节可能是 val 欧元数量的一部分。