使用 Marpa:r2 perl 解析单引号字符串

Question

如何使用Marpa:r2解析单引号字符串？在我下面的代码中，单引号字符串在解析时附加 '\'。

代码：

use strict;
use Marpa::R2;
use Data::Dumper;


my $grammar = Marpa::R2::Scanless::G->new(
   {  default_action => '[values]',
      source         => \(<<'END_OF_SOURCE'),
  lexeme default = latm => 1

:start ::= Expression

# include begin

Expression ::= Param
Param ::= Unquoted                                         
        | ('"') Quoted ('"') 
        | (') Quoted (')

:discard      ~ whitespace 
whitespace    ~ [\s]+

Unquoted      ~ [^\s\/\(\),&:\"~]+
Quoted        ~ [^\s&:\"~]+

END_OF_SOURCE
   });

my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';

my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });

print "Trying to parse:\n$input1\n\n";
$recce->read($input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);

输出的：

Trying to parse:
foo

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
"foo"

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
'foo'

Output:
$VAR1 = [
          [
            '\'foo\''
          ]
        ]; (don't want it to be parsed like this)

以上是所有输入的输出，我不希望第三个输入附加“\”和单引号。我希望它像 OUTPUT2 一样被解析。请指教

理想情况下，它应该根据Param ::= (') Quoted (')选择单引号之间的内容

Answer 1

您的结果不包含 \'，它包含 '。 Dumper 只是像这样格式化结果，所以很清楚字符串中的内容和内容。

您可以自己测试此行为：

use Data::Dumper;

my $tick = chr(39);
my $back = chr(92);

print "Tick Dumper: " . Dumper($tick);
print "Tick Print:  " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print:  " . $back . "\n";

您可以在此处查看演示：https://ideone.com/d1V8OE

如果您不希望输出包含单引号，您可能需要自己将其从输入中删除。

Answer 2

我对 Marpa::R2 不是很熟悉，但是您可以尝试对 Expression 规则使用操作吗：

Expression ::= Param action => strip_quotes

然后，实现一个简单的引用剥离器，如：

sub MyActions::strip_quotes {
    @{$_[1]}[0] =~ s/^'|'$//gr;
}

Answer 3

关于 Data::Dumper 输出的另一个答案是正确的。但是，您的语法并不像您期望的那样工作。

当您解析输入 'foo' 时，Marpa 会考虑三个 Param 备选方案。该位置的预测词位是：

Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('

是的，最后一个字面意思是 ) Quoted (，不包含单引号。

即使是([']) Quoted (['])：由于最长的标记匹配，未加引号的词素将匹配整个输入，包括单引号。

像 " foo " 这样的输入（带双引号）会发生什么？现在，只有 '"' 词素会匹配，然后任何空格都会被丢弃，然后引用的词素匹配，然后任何空格都会被丢弃，然后结束 " 会被匹配。

为了防止这种跳过空格的行为并防止 Unquoted 规则由于 LATM 而成为首选，将带引号的字符串描述为词位是有意义的。例如：

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

这些词素将包含任何引号和转义符，因此您需要 post 处理词素内容。您可以使用事件系统（概念上很清晰，但实现起来有点麻烦）或添加一个在解析评估期间执行此处理的操作。

由于词位不能有动作，通常最好加上代理产生式：

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

然后该操作可以执行如下操作：

sub process_quoted {
  my (undef, $s) = @_;
  # remove delimiters from double-quoted string
  return  if $s =~ /^"(.*)"$/s;
  # remove delimiters from single-quoted string
  return  if $s =~ /^'(.*)'$/s;
  die "String was not delimited with single or double quotes";
}

使用 Marpa:r2 perl 解析单引号字符串

Parse single quoted string using Marpa:r2 perl

perl

grammar

regexp-grammars

marpa