用于对象树的嵌套字符串表示的超强解析器
Superpower parser for nested string representation of object tree
我很难理解递归解析在 Superpower 中的工作原理。我已经研究了 github 上的博文和示例,但还是不明白。
有人能告诉我如何从我编写的 Tokenizer 中使用建议的结构(见下文)重建 AST 吗?
这是我的目标:
我正在使用 Kuka 机器人。通过 tcp 客户端,我可以读取机器人控制器上变量的内容。变量的内容作为单个字符串返回给我。我想解析这个字符串并填充一个适应机器人语言的自定义 AST。
库卡机器人语言(KRL):
在机器人语言中,我有以下原始类型:BOOL, INT, CHAR, REAL
我还可以创建自定义枚举。枚举的值前面有“#”:ENUM
字符串表示为 CHAR 数组:CHAR[]
此外,还可以创建称为 STRUC 的复合结构。结构聚合字段值数据(可以是 BOOL、INT、CHAR、STRING、REAL、ENUM 或 STRUC):STRUC
要解析的数据样本:
这是我要解析的数据的典型示例,当我向机器人询问变量 progLogDb[1]
时,它是 progLogDb
的第一项,机器人程序日志数组,其中每个项目都是一个 PROLOG
结构 :
{PROGLOG: ProgName[] "any ascii string {[]%,&}", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR "A", MyBool False, MyEnum #EnumValue}
在此示例中,您可以看到结构是如何嵌套的。 struc 写道:{Type: key-value, key-value, ...}
其中值是 BOOL, INT, REAL, ENUM, STRING, STRUC
。如果该值是原始数据类型,则必须在解析过程中推断出该类型。
这是我要构建的树:
proglogDB[1] (PROGLOG:)
- ProgName[] "lgocell_mdi"
- StartDate (DATE:)
- CSEC 0
- SEC 22
- MIN 36
- HOUR 16
- DAY 4
- MONTH 1
- YEAR 2019
- EndDate (DATE:)
- CSEC
- SEC 3
- MIN 36
- HOUR 16
- DAY 4
- MONTH 1
- YEAR 2019}
- QuitDate (DATE:)
- CSEC 0,
- SEC 36,
- MIN 36,
- HOUR 16,
- DAY 4,
- MONTH 1,
- YEAR 2019
- ActiveTime 11.00000
- MyInt 10
- MyReal -1.091e-24
- MyCHAR "A"
- MyBool False
- MyEnum #EnumValue
标记化
到目前为止,我已经使用以下代码成功完成了标记化部分(我相信):
enum KrlToken
{
// struct delimiters
[Token(Example = "{")]
LBracket,
[Token(Example = "}")]
RBracket,
// field delimiters
[Token(Example = ",")]
Comma,
// data
Type,
Boolean,
Integer,
Real,
String,
Enum,
Identifier,
}
static class KrlTokenizer
{
#region TokenParser
static TextParser<Unit> KrlBooleanToken { get; } =
from content in Span.EqualToIgnoreCase("false")
.Or(Span.EqualToIgnoreCase("true"))
select Unit.Value;
static TextParser<Unit> KrlStringToken { get; } =
from open in Character.EqualTo('"')
from content in Span.EqualTo("\\"").Value(Unit.Value).Try()
.Or(Span.EqualTo("\\").Value(Unit.Value).Try())
.Or(Character.Except('"').Value(Unit.Value))
.IgnoreMany()
from close in Character.EqualTo('"')
select Unit.Value;
static TextParser<Unit> KrlIntegerToken { get; } =
from sign in Character.EqualTo('-').OptionalOrDefault()
from first in Character.Digit
from rest in Character.Digit.IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlRealToken { get; } =
from sign in Character.EqualTo('-').OptionalOrDefault()
from first in Character.Digit
from rest in Character.Digit.Or(Character.In('.', 'e', 'E', '+', '-')).IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlEnumToken { get; } =
from open in Character.EqualTo('#')
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
.IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlTypeToken { get; } =
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
.IgnoreMany()
from close in Character.EqualTo(':')
select Unit.Value;
static TextParser<Unit> KrlIdentifierToken { get; } =
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$', '[', ']'))
.IgnoreMany()
select Unit.Value;
#endregion
public static Tokenizer<KrlToken> Instance { get; } =
new TokenizerBuilder<KrlToken>()
.Ignore(Span.WhiteSpace)
.Match(Character.EqualTo('{'), KrlToken.LBracket)
.Match(Character.EqualTo('}'), KrlToken.RBracket)
.Match(Character.EqualTo(','), KrlToken.Comma)
.Match(KrlTypeToken, KrlToken.Type)
.Match(KrlEnumToken, KrlToken.Enum)
.Match(KrlStringToken, KrlToken.String)
.Match(KrlBooleanToken, KrlToken.Boolean)
.Match(KrlIntegerToken, KrlToken.Integer, requireDelimiters: true)
.Match(KrlRealToken, KrlToken.Real, requireDelimiters: true)
.Match(KrlIdentifierToken, KrlToken.Identifier, requireDelimiters: true)
.Build();
}
对于示例,它为我提供了以下标记:
LBracket@0 (line 1, column 1): {
Type@1 (line 1, column 2): PROGLOG:
Identifier@10 (line 1, column 11): ProgName[]
String@21 (line 1, column 22): "lgocell_mdi{} {[]%,&}"
Comma@44 (line 1, column 45): ,
Identifier@46 (line 1, column 47): StartDate
LBracket@56 (line 1, column 57): {
Type@57 (line 1, column 58): DATE:
Identifier@63 (line 1, column 64): CSEC
Real@68 (line 1, column 69): 0.124
Comma@73 (line 1, column 74): ,
Identifier@75 (line 1, column 76): SEC
Integer@79 (line 1, column 80): -22
Comma@82 (line 1, column 83): ,
Identifier@84 (line 1, column 85): MIN
Integer@88 (line 1, column 89): 36
Comma@90 (line 1, column 91): ,
Identifier@92 (line 1, column 93): HOUR
Integer@97 (line 1, column 98): 16
Comma@99 (line 1, column 100): ,
Identifier@101 (line 1, column 102): DAY
Integer@105 (line 1, column 106): 4
Comma@106 (line 1, column 107): ,
Identifier@108 (line 1, column 109): MONTH
Integer@114 (line 1, column 115): 1
Comma@115 (line 1, column 116): ,
Identifier@117 (line 1, column 118): YEAR
Integer@122 (line 1, column 123): 2019
RBracket@126 (line 1, column 127): }
Comma@127 (line 1, column 128): ,
Identifier@129 (line 1, column 130): EndDate
LBracket@137 (line 1, column 138): {
Type@138 (line 1, column 139): DATE:
Identifier@144 (line 1, column 145): CSEC
Integer@149 (line 1, column 150): 0
Comma@150 (line 1, column 151): ,
Identifier@152 (line 1, column 153): SEC
Integer@156 (line 1, column 157): 36
Comma@158 (line 1, column 159): ,
Identifier@160 (line 1, column 161): MIN
Integer@164 (line 1, column 165): 36
Comma@166 (line 1, column 167): ,
Identifier@168 (line 1, column 169): HOUR
Integer@173 (line 1, column 174): 16
Comma@175 (line 1, column 176): ,
Identifier@177 (line 1, column 178): DAY
Integer@181 (line 1, column 182): 4
Comma@182 (line 1, column 183): ,
Identifier@184 (line 1, column 185): MONTH
Integer@190 (line 1, column 191): 1
Comma@191 (line 1, column 192): ,
Identifier@193 (line 1, column 194): YEAR
Integer@198 (line 1, column 199): 2019
RBracket@202 (line 1, column 203): }
Comma@203 (line 1, column 204): ,
Identifier@205 (line 1, column 206): QuitDate
LBracket@214 (line 1, column 215): {
Type@215 (line 1, column 216): DATE:
Identifier@221 (line 1, column 222): CSEC
Integer@226 (line 1, column 227): 0
Comma@227 (line 1, column 228): ,
Identifier@229 (line 1, column 230): SEC
Integer@233 (line 1, column 234): 36
Comma@235 (line 1, column 236): ,
Identifier@237 (line 1, column 238): MIN
Integer@241 (line 1, column 242): 36
Comma@243 (line 1, column 244): ,
Identifier@245 (line 1, column 246): HOUR
Integer@250 (line 1, column 251): 16
Comma@252 (line 1, column 253): ,
Identifier@254 (line 1, column 255): DAY
Integer@258 (line 1, column 259): 4
Comma@259 (line 1, column 260): ,
Identifier@261 (line 1, column 262): MONTH
Integer@267 (line 1, column 268): 1
Comma@268 (line 1, column 269): ,
Identifier@270 (line 1, column 271): YEAR
Integer@275 (line 1, column 276): 2019
RBracket@279 (line 1, column 280): }
Comma@280 (line 1, column 281): ,
Identifier@282 (line 1, column 283): ActiveTime
Real@293 (line 1, column 294): 11.00000
Comma@301 (line 1, column 302): ,
Identifier@303 (line 1, column 304): MyEnum
Enum@310 (line 1, column 311): #EnumValue
Comma@320 (line 1, column 321): ,
Identifier@322 (line 1, column 323): MyInt
Integer@328 (line 1, column 329): 10
Comma@330 (line 1, column 331): ,
Identifier@332 (line 1, column 333): MyReal
Real@339 (line 1, column 340): -1.091e-24
Comma@349 (line 1, column 350): ,
Identifier@351 (line 1, column 352): MyChar
String@358 (line 1, column 359): "A"
Comma@361 (line 1, column 362): ,
Identifier@363 (line 1, column 364): MyBool
Boolean@370 (line 1, column 371): False
RBracket@375 (line 1, column 376): }
解析成 AST
既然我的标记化看起来不错,我想将标记解析为自定义 AST,即关联字段值对、推断原始类型并重新创建正确的 struc 嵌套。对此部分的任何帮助将不胜感激。
public enum DataType
{
BOOL,
INT,
REAL,
STRING,
ENUM,
STRUC
}
public abstract class Data
{
private static Regex _array = new Regex(@"\[([\d]+)\]", RegexOptions.IgnoreCase);
public abstract DataType Type { get; }
public string Name { get; set; }
public bool IsScalar { get => Type != DataType.STRUC; }
public bool IsComposite { get => Type == DataType.STRUC; }
public bool IsArrayElement(out short index)
{
index = 0;
Match match = _array.Match(Name);
if (match.Success)
{
index = short.Parse(match.Groups[1].Value);
return true;
}
else
{
return false;
}
}
}
public class BoolData : Data
{
public override DataType Type => DataType.BOOL;
public bool Value { get; private set; }
public BoolData(string name, bool value)
{
Name = name;
Value = value;
}
}
public class IntData : Data
{
public override DataType Type => DataType.INT;
public short Value { get; private set; }
public IntData(string name, short value)
{
Name = name;
Value = value;
}
}
public class RealData : Data
{
public override DataType Type => DataType.REAL;
public double Value { get; private set; }
public RealData(string name, double value)
{
Name = name;
Value = value;
}
}
public class StringData : Data
{
public override DataType Type => DataType.STRING;
public string Value { get; private set; }
public StringData(string name, string value)
{
Name = name;
Value = value;
}
}
public class EnumData : Data
{
public override DataType Type => DataType.ENUM;
public string Value { get; private set; }
public EnumData(string name, string value)
{
Name = name;
Value = value;
}
}
public class StrucData : Data
{
public override DataType Type => DataType.STRUC;
public List<Data> Value = new List<Data>();
public StrucData(string name)
{
Name = name;
Value = new List<Data>();
}
public void Add(Data data) => Value.Add(data);
}
所以您需要为您定义的每个 Data
class 创建一个解析器。原始类型相当简单,但 StrucData
解析器是需要递归的解析器。它必须使用 Or().Try()
尝试每个原始解析器,但如果不成功,它必须尝试使用递归解析另一个 StrucData
。然后在成功解析之后,您可以使用函数 ManyDelimitedBy
获得 List<Data>
结果,因为每个 Data
对象都用逗号分隔。
试试这个:
public static class KrlParsers
{
public static TokenListParser<KrlToken, BoolData> BoolParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Boolean)
select new BoolData(id.ToStringValue(), bool.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, IntData> IntParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Integer)
select new IntData(id.ToStringValue(), short.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, RealData> RealParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Real)
select new RealData(id.ToStringValue(), double.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, StringData> StringParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.String)
select new StringData(id.ToStringValue(), val.ToStringValue());
public static TokenListParser<KrlToken, EnumData> EnumParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Enum)
select new EnumData(id.ToStringValue(), val.ToStringValue());
public static TokenListParser<KrlToken, StrucData> StrucParser =
from id in Token.EqualTo(KrlToken.Identifier).Optional()
from _lb in Token.EqualTo(KrlToken.LBracket)
from type in Token.EqualTo(KrlToken.Type)
from data in
StringParser.Select(x => (Data)x).Try()
.Or(IntParser.Select(x => (Data)x)).Try()
.Or(RealParser.Select(x => (Data)x)).Try()
.Or(BoolParser.Select(x => (Data)x)).Try()
.Or(EnumParser.Select(x => (Data)x)).Try()
.Or(StrucParser.Select(x => (Data)x)).Try() // RECURSIVE
.ManyDelimitedBy(Token.EqualTo(KrlToken.Comma))
from _rb in Token.EqualTo(KrlToken.RBracket)
select new StrucData(id.HasValue ? id.Value.ToStringValue() : "", data.ToList());
}
我还为 StrucData
添加了另一个构造函数 class 以接受 List<Data>
:
public StrucData(string name, List<Data> data)
{
Name = name;
Value = data;
}
然后实际解析输入字符串,运行 this:
string input = @"{PROGLOG: ProgName[] ""any ascii string {[]%,&}"", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR ""A"", MyBool False, MyEnum #EnumValue}";
var tokens = KrlTokenizer.Instance.Tokenize(input);
StrucData data = KrlParsers.StrucParser.Parse(tokens);
我很难理解递归解析在 Superpower 中的工作原理。我已经研究了 github 上的博文和示例,但还是不明白。
有人能告诉我如何从我编写的 Tokenizer 中使用建议的结构(见下文)重建 AST 吗?
这是我的目标:
我正在使用 Kuka 机器人。通过 tcp 客户端,我可以读取机器人控制器上变量的内容。变量的内容作为单个字符串返回给我。我想解析这个字符串并填充一个适应机器人语言的自定义 AST。
库卡机器人语言(KRL):
在机器人语言中,我有以下原始类型:BOOL, INT, CHAR, REAL
我还可以创建自定义枚举。枚举的值前面有“#”:ENUM
字符串表示为 CHAR 数组:CHAR[]
此外,还可以创建称为 STRUC 的复合结构。结构聚合字段值数据(可以是 BOOL、INT、CHAR、STRING、REAL、ENUM 或 STRUC):STRUC
要解析的数据样本:
这是我要解析的数据的典型示例,当我向机器人询问变量 progLogDb[1]
时,它是 progLogDb
的第一项,机器人程序日志数组,其中每个项目都是一个 PROLOG
结构 :
{PROGLOG: ProgName[] "any ascii string {[]%,&}", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR "A", MyBool False, MyEnum #EnumValue}
在此示例中,您可以看到结构是如何嵌套的。 struc 写道:{Type: key-value, key-value, ...}
其中值是 BOOL, INT, REAL, ENUM, STRING, STRUC
。如果该值是原始数据类型,则必须在解析过程中推断出该类型。
这是我要构建的树:
proglogDB[1] (PROGLOG:)
- ProgName[] "lgocell_mdi"
- StartDate (DATE:)
- CSEC 0
- SEC 22
- MIN 36
- HOUR 16
- DAY 4
- MONTH 1
- YEAR 2019
- EndDate (DATE:)
- CSEC
- SEC 3
- MIN 36
- HOUR 16
- DAY 4
- MONTH 1
- YEAR 2019}
- QuitDate (DATE:)
- CSEC 0,
- SEC 36,
- MIN 36,
- HOUR 16,
- DAY 4,
- MONTH 1,
- YEAR 2019
- ActiveTime 11.00000
- MyInt 10
- MyReal -1.091e-24
- MyCHAR "A"
- MyBool False
- MyEnum #EnumValue
标记化
到目前为止,我已经使用以下代码成功完成了标记化部分(我相信):
enum KrlToken
{
// struct delimiters
[Token(Example = "{")]
LBracket,
[Token(Example = "}")]
RBracket,
// field delimiters
[Token(Example = ",")]
Comma,
// data
Type,
Boolean,
Integer,
Real,
String,
Enum,
Identifier,
}
static class KrlTokenizer
{
#region TokenParser
static TextParser<Unit> KrlBooleanToken { get; } =
from content in Span.EqualToIgnoreCase("false")
.Or(Span.EqualToIgnoreCase("true"))
select Unit.Value;
static TextParser<Unit> KrlStringToken { get; } =
from open in Character.EqualTo('"')
from content in Span.EqualTo("\\"").Value(Unit.Value).Try()
.Or(Span.EqualTo("\\").Value(Unit.Value).Try())
.Or(Character.Except('"').Value(Unit.Value))
.IgnoreMany()
from close in Character.EqualTo('"')
select Unit.Value;
static TextParser<Unit> KrlIntegerToken { get; } =
from sign in Character.EqualTo('-').OptionalOrDefault()
from first in Character.Digit
from rest in Character.Digit.IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlRealToken { get; } =
from sign in Character.EqualTo('-').OptionalOrDefault()
from first in Character.Digit
from rest in Character.Digit.Or(Character.In('.', 'e', 'E', '+', '-')).IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlEnumToken { get; } =
from open in Character.EqualTo('#')
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
.IgnoreMany()
select Unit.Value;
static TextParser<Unit> KrlTypeToken { get; } =
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
.IgnoreMany()
from close in Character.EqualTo(':')
select Unit.Value;
static TextParser<Unit> KrlIdentifierToken { get; } =
from first in Character.Letter.Or(Character.In('_', '$'))
from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$', '[', ']'))
.IgnoreMany()
select Unit.Value;
#endregion
public static Tokenizer<KrlToken> Instance { get; } =
new TokenizerBuilder<KrlToken>()
.Ignore(Span.WhiteSpace)
.Match(Character.EqualTo('{'), KrlToken.LBracket)
.Match(Character.EqualTo('}'), KrlToken.RBracket)
.Match(Character.EqualTo(','), KrlToken.Comma)
.Match(KrlTypeToken, KrlToken.Type)
.Match(KrlEnumToken, KrlToken.Enum)
.Match(KrlStringToken, KrlToken.String)
.Match(KrlBooleanToken, KrlToken.Boolean)
.Match(KrlIntegerToken, KrlToken.Integer, requireDelimiters: true)
.Match(KrlRealToken, KrlToken.Real, requireDelimiters: true)
.Match(KrlIdentifierToken, KrlToken.Identifier, requireDelimiters: true)
.Build();
}
对于示例,它为我提供了以下标记:
LBracket@0 (line 1, column 1): {
Type@1 (line 1, column 2): PROGLOG:
Identifier@10 (line 1, column 11): ProgName[]
String@21 (line 1, column 22): "lgocell_mdi{} {[]%,&}"
Comma@44 (line 1, column 45): ,
Identifier@46 (line 1, column 47): StartDate
LBracket@56 (line 1, column 57): {
Type@57 (line 1, column 58): DATE:
Identifier@63 (line 1, column 64): CSEC
Real@68 (line 1, column 69): 0.124
Comma@73 (line 1, column 74): ,
Identifier@75 (line 1, column 76): SEC
Integer@79 (line 1, column 80): -22
Comma@82 (line 1, column 83): ,
Identifier@84 (line 1, column 85): MIN
Integer@88 (line 1, column 89): 36
Comma@90 (line 1, column 91): ,
Identifier@92 (line 1, column 93): HOUR
Integer@97 (line 1, column 98): 16
Comma@99 (line 1, column 100): ,
Identifier@101 (line 1, column 102): DAY
Integer@105 (line 1, column 106): 4
Comma@106 (line 1, column 107): ,
Identifier@108 (line 1, column 109): MONTH
Integer@114 (line 1, column 115): 1
Comma@115 (line 1, column 116): ,
Identifier@117 (line 1, column 118): YEAR
Integer@122 (line 1, column 123): 2019
RBracket@126 (line 1, column 127): }
Comma@127 (line 1, column 128): ,
Identifier@129 (line 1, column 130): EndDate
LBracket@137 (line 1, column 138): {
Type@138 (line 1, column 139): DATE:
Identifier@144 (line 1, column 145): CSEC
Integer@149 (line 1, column 150): 0
Comma@150 (line 1, column 151): ,
Identifier@152 (line 1, column 153): SEC
Integer@156 (line 1, column 157): 36
Comma@158 (line 1, column 159): ,
Identifier@160 (line 1, column 161): MIN
Integer@164 (line 1, column 165): 36
Comma@166 (line 1, column 167): ,
Identifier@168 (line 1, column 169): HOUR
Integer@173 (line 1, column 174): 16
Comma@175 (line 1, column 176): ,
Identifier@177 (line 1, column 178): DAY
Integer@181 (line 1, column 182): 4
Comma@182 (line 1, column 183): ,
Identifier@184 (line 1, column 185): MONTH
Integer@190 (line 1, column 191): 1
Comma@191 (line 1, column 192): ,
Identifier@193 (line 1, column 194): YEAR
Integer@198 (line 1, column 199): 2019
RBracket@202 (line 1, column 203): }
Comma@203 (line 1, column 204): ,
Identifier@205 (line 1, column 206): QuitDate
LBracket@214 (line 1, column 215): {
Type@215 (line 1, column 216): DATE:
Identifier@221 (line 1, column 222): CSEC
Integer@226 (line 1, column 227): 0
Comma@227 (line 1, column 228): ,
Identifier@229 (line 1, column 230): SEC
Integer@233 (line 1, column 234): 36
Comma@235 (line 1, column 236): ,
Identifier@237 (line 1, column 238): MIN
Integer@241 (line 1, column 242): 36
Comma@243 (line 1, column 244): ,
Identifier@245 (line 1, column 246): HOUR
Integer@250 (line 1, column 251): 16
Comma@252 (line 1, column 253): ,
Identifier@254 (line 1, column 255): DAY
Integer@258 (line 1, column 259): 4
Comma@259 (line 1, column 260): ,
Identifier@261 (line 1, column 262): MONTH
Integer@267 (line 1, column 268): 1
Comma@268 (line 1, column 269): ,
Identifier@270 (line 1, column 271): YEAR
Integer@275 (line 1, column 276): 2019
RBracket@279 (line 1, column 280): }
Comma@280 (line 1, column 281): ,
Identifier@282 (line 1, column 283): ActiveTime
Real@293 (line 1, column 294): 11.00000
Comma@301 (line 1, column 302): ,
Identifier@303 (line 1, column 304): MyEnum
Enum@310 (line 1, column 311): #EnumValue
Comma@320 (line 1, column 321): ,
Identifier@322 (line 1, column 323): MyInt
Integer@328 (line 1, column 329): 10
Comma@330 (line 1, column 331): ,
Identifier@332 (line 1, column 333): MyReal
Real@339 (line 1, column 340): -1.091e-24
Comma@349 (line 1, column 350): ,
Identifier@351 (line 1, column 352): MyChar
String@358 (line 1, column 359): "A"
Comma@361 (line 1, column 362): ,
Identifier@363 (line 1, column 364): MyBool
Boolean@370 (line 1, column 371): False
RBracket@375 (line 1, column 376): }
解析成 AST
既然我的标记化看起来不错,我想将标记解析为自定义 AST,即关联字段值对、推断原始类型并重新创建正确的 struc 嵌套。对此部分的任何帮助将不胜感激。
public enum DataType
{
BOOL,
INT,
REAL,
STRING,
ENUM,
STRUC
}
public abstract class Data
{
private static Regex _array = new Regex(@"\[([\d]+)\]", RegexOptions.IgnoreCase);
public abstract DataType Type { get; }
public string Name { get; set; }
public bool IsScalar { get => Type != DataType.STRUC; }
public bool IsComposite { get => Type == DataType.STRUC; }
public bool IsArrayElement(out short index)
{
index = 0;
Match match = _array.Match(Name);
if (match.Success)
{
index = short.Parse(match.Groups[1].Value);
return true;
}
else
{
return false;
}
}
}
public class BoolData : Data
{
public override DataType Type => DataType.BOOL;
public bool Value { get; private set; }
public BoolData(string name, bool value)
{
Name = name;
Value = value;
}
}
public class IntData : Data
{
public override DataType Type => DataType.INT;
public short Value { get; private set; }
public IntData(string name, short value)
{
Name = name;
Value = value;
}
}
public class RealData : Data
{
public override DataType Type => DataType.REAL;
public double Value { get; private set; }
public RealData(string name, double value)
{
Name = name;
Value = value;
}
}
public class StringData : Data
{
public override DataType Type => DataType.STRING;
public string Value { get; private set; }
public StringData(string name, string value)
{
Name = name;
Value = value;
}
}
public class EnumData : Data
{
public override DataType Type => DataType.ENUM;
public string Value { get; private set; }
public EnumData(string name, string value)
{
Name = name;
Value = value;
}
}
public class StrucData : Data
{
public override DataType Type => DataType.STRUC;
public List<Data> Value = new List<Data>();
public StrucData(string name)
{
Name = name;
Value = new List<Data>();
}
public void Add(Data data) => Value.Add(data);
}
所以您需要为您定义的每个 Data
class 创建一个解析器。原始类型相当简单,但 StrucData
解析器是需要递归的解析器。它必须使用 Or().Try()
尝试每个原始解析器,但如果不成功,它必须尝试使用递归解析另一个 StrucData
。然后在成功解析之后,您可以使用函数 ManyDelimitedBy
获得 List<Data>
结果,因为每个 Data
对象都用逗号分隔。
试试这个:
public static class KrlParsers
{
public static TokenListParser<KrlToken, BoolData> BoolParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Boolean)
select new BoolData(id.ToStringValue(), bool.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, IntData> IntParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Integer)
select new IntData(id.ToStringValue(), short.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, RealData> RealParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Real)
select new RealData(id.ToStringValue(), double.Parse(val.ToStringValue()));
public static TokenListParser<KrlToken, StringData> StringParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.String)
select new StringData(id.ToStringValue(), val.ToStringValue());
public static TokenListParser<KrlToken, EnumData> EnumParser =
from id in Token.EqualTo(KrlToken.Identifier)
from val in Token.EqualTo(KrlToken.Enum)
select new EnumData(id.ToStringValue(), val.ToStringValue());
public static TokenListParser<KrlToken, StrucData> StrucParser =
from id in Token.EqualTo(KrlToken.Identifier).Optional()
from _lb in Token.EqualTo(KrlToken.LBracket)
from type in Token.EqualTo(KrlToken.Type)
from data in
StringParser.Select(x => (Data)x).Try()
.Or(IntParser.Select(x => (Data)x)).Try()
.Or(RealParser.Select(x => (Data)x)).Try()
.Or(BoolParser.Select(x => (Data)x)).Try()
.Or(EnumParser.Select(x => (Data)x)).Try()
.Or(StrucParser.Select(x => (Data)x)).Try() // RECURSIVE
.ManyDelimitedBy(Token.EqualTo(KrlToken.Comma))
from _rb in Token.EqualTo(KrlToken.RBracket)
select new StrucData(id.HasValue ? id.Value.ToStringValue() : "", data.ToList());
}
我还为 StrucData
添加了另一个构造函数 class 以接受 List<Data>
:
public StrucData(string name, List<Data> data)
{
Name = name;
Value = data;
}
然后实际解析输入字符串,运行 this:
string input = @"{PROGLOG: ProgName[] ""any ascii string {[]%,&}"", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR ""A"", MyBool False, MyEnum #EnumValue}";
var tokens = KrlTokenizer.Instance.Tokenize(input);
StrucData data = KrlParsers.StrucParser.Parse(tokens);