在 perl 中解析 JSON-like 标记
Parse JSON-like markup in perl
老实说,我不确定这是什么标记,虽然我一开始 JSON 但是使用 JSON::Parse
中的 parse_json
失败了:JSON error at line 2, byte 15/1380740: Unexpected character '{' parsing initial state: expecting whitespace: '\n', '\r', '\t', ' ' at ....
这是我解析成哈希的内容:https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt。
我尝试了什么:
use strict;
use warnings;
use LWP::UserAgent;
use JSON::Parse 'parse_json';
my $ua = LWP::UserAgent->new;
my $response = $ua->get( "https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt" );
if ( $response->is_success ) {
my $game_items = parse_json( $response->content );
# ... do stuff
}
我做错了什么吗?这是 JSON 还是我必须创建一些 hack-ish 解决方案来解析它?
我无法通过 "Questions that may already have your answer" 部分找到任何建议,但我认为如果我知道此标记的名称会更容易。
这将处理您的数据,虽然有点笨拙,但可以完成工作!
use strict;
use warnings;
use autodie;
use Data::Dump;
use LWP::Simple qw/ mirror /;
use constant URL => 'https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt';
use constant MIRROR => 'steamcdn.txt';
my $data = do {
mirror URL, MIRROR;
open my $fh, '<', MIRROR;
local $/;
<$fh>;
};
my ($hash, $key);
my @stack;
while ( ) {
if ( $data =~ / \G \s* " ([^"]*) " /gcx ) {
if ( defined $key ) {
$hash->{$key} = ;
$key = undef;
}
else {
$key = ;
}
}
elsif ( $data =~ / \G \s* \{ /gcx ) {
push @stack, [ $hash, $key ];
$key = $hash = undef;
}
elsif ( $data =~ / \G \s* \} /gcx ) {
die "Structure unbalanced" if defined $key or @stack == 0;
my ($parent, $key) = @{ pop @stack };
$parent->{$key} = $hash;
$hash = $parent;
}
else {
last;
}
}
die "Structure unbalanced" if @stack;
dd $hash;
输出
{
items_game => {
alternate_icons2 => {
weapon_icons => {
65604 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_light",
},
65605 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_medium",
},
65606 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_heavy",
},
65684 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_light",
},
65685 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_medium",
},
65686 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_heavy",
},
65696 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_light",
},
65697 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_medium",
},
65698 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_heavy",
},
65780 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_light",
},
65781 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_medium",
},
65782 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_heavy",
},
65896 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_light",
},
65897 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_medium",
},
65898 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_heavy",
},
66276 => {
icon_path => "econ/default_generated/weapon_deagle_am_scales_bravo_light",
},
66277 => {
icon_path => "econ/default_generated/weapon_deagle_am_scales_bravo_medium",
},
66278 => {
my $file = do { local $/; <> };
my @stack = [];
my %handlers = (
'"' => sub {
/\G ([^"]*) " /xgc
or die("Unterminated \"\n");
push(@{ $stack[-1] }, );
},
'{' => sub {
die("Expected string\n") if @{ $stack[-1] } % 2 == 0;
push(@stack, []);
},
'}' => sub {
die("Unmatched \"}\"\n") if @stack == 1;
my $hash = pop(@stack);
die("Missing value\n") if @$hash % 2 == 1;
push(@{ $stack[-1] }, { @$hash });
},
);
my $data;
for ($file) {
while (1) {
my $next_char = /\G \s* (\S) /gcx ? : last;
my $handler = $handlers{$next_char}
or die("Unrecognized character \"$next_char\"\n");
$handler->();
}
die("Unmatched \"{\"\n") if @stack > 1;
my $hash = pop(@stack);
die("Missing value\n") if @$hash % 2 == 1;
$data = { @$hash };
}
除了比 Borodin 的更简单的堆栈和使用分派 table 而不是长序列 "if" 之外,此版本还提供了适当的错误检测。这将检测截断的文档以及不受支持的功能。
老实说,我不确定这是什么标记,虽然我一开始 JSON 但是使用 JSON::Parse
中的 parse_json
失败了:JSON error at line 2, byte 15/1380740: Unexpected character '{' parsing initial state: expecting whitespace: '\n', '\r', '\t', ' ' at ....
这是我解析成哈希的内容:https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt。
我尝试了什么:
use strict;
use warnings;
use LWP::UserAgent;
use JSON::Parse 'parse_json';
my $ua = LWP::UserAgent->new;
my $response = $ua->get( "https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt" );
if ( $response->is_success ) {
my $game_items = parse_json( $response->content );
# ... do stuff
}
我做错了什么吗?这是 JSON 还是我必须创建一些 hack-ish 解决方案来解析它?
我无法通过 "Questions that may already have your answer" 部分找到任何建议,但我认为如果我知道此标记的名称会更容易。
这将处理您的数据,虽然有点笨拙,但可以完成工作!
use strict;
use warnings;
use autodie;
use Data::Dump;
use LWP::Simple qw/ mirror /;
use constant URL => 'https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.d8a302f03758b99ab65b60b3a4a11d73ca4738bd.txt';
use constant MIRROR => 'steamcdn.txt';
my $data = do {
mirror URL, MIRROR;
open my $fh, '<', MIRROR;
local $/;
<$fh>;
};
my ($hash, $key);
my @stack;
while ( ) {
if ( $data =~ / \G \s* " ([^"]*) " /gcx ) {
if ( defined $key ) {
$hash->{$key} = ;
$key = undef;
}
else {
$key = ;
}
}
elsif ( $data =~ / \G \s* \{ /gcx ) {
push @stack, [ $hash, $key ];
$key = $hash = undef;
}
elsif ( $data =~ / \G \s* \} /gcx ) {
die "Structure unbalanced" if defined $key or @stack == 0;
my ($parent, $key) = @{ pop @stack };
$parent->{$key} = $hash;
$hash = $parent;
}
else {
last;
}
}
die "Structure unbalanced" if @stack;
dd $hash;
输出
{
items_game => {
alternate_icons2 => {
weapon_icons => {
65604 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_light",
},
65605 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_medium",
},
65606 => {
icon_path => "econ/default_generated/weapon_deagle_hy_ddpat_urb_heavy",
},
65684 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_light",
},
65685 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_medium",
},
65686 => {
icon_path => "econ/default_generated/weapon_deagle_aa_flames_heavy",
},
65696 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_light",
},
65697 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_medium",
},
65698 => {
icon_path => "econ/default_generated/weapon_deagle_so_night_heavy",
},
65780 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_light",
},
65781 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_medium",
},
65782 => {
icon_path => "econ/default_generated/weapon_deagle_aa_vertigo_heavy",
},
65896 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_light",
},
65897 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_medium",
},
65898 => {
icon_path => "econ/default_generated/weapon_deagle_hy_mottled_sand_heavy",
},
66276 => {
icon_path => "econ/default_generated/weapon_deagle_am_scales_bravo_light",
},
66277 => {
icon_path => "econ/default_generated/weapon_deagle_am_scales_bravo_medium",
},
66278 => {
my $file = do { local $/; <> };
my @stack = [];
my %handlers = (
'"' => sub {
/\G ([^"]*) " /xgc
or die("Unterminated \"\n");
push(@{ $stack[-1] }, );
},
'{' => sub {
die("Expected string\n") if @{ $stack[-1] } % 2 == 0;
push(@stack, []);
},
'}' => sub {
die("Unmatched \"}\"\n") if @stack == 1;
my $hash = pop(@stack);
die("Missing value\n") if @$hash % 2 == 1;
push(@{ $stack[-1] }, { @$hash });
},
);
my $data;
for ($file) {
while (1) {
my $next_char = /\G \s* (\S) /gcx ? : last;
my $handler = $handlers{$next_char}
or die("Unrecognized character \"$next_char\"\n");
$handler->();
}
die("Unmatched \"{\"\n") if @stack > 1;
my $hash = pop(@stack);
die("Missing value\n") if @$hash % 2 == 1;
$data = { @$hash };
}
除了比 Borodin 的更简单的堆栈和使用分派 table 而不是长序列 "if" 之外,此版本还提供了适当的错误检测。这将检测截断的文档以及不受支持的功能。