如何从一行文本中解析关键字和字符串

Question

有一个文件 keywords.tx

Commands:
    keywords = 'this' & 'way'
;
StartWords:
    keywords = 'bag'
;

然后一个文件 mygram.tx 和

import keywords

MyModel:
    keyword*=StartWords[' ']
    name+=Word[' ']
;
Word:
    text=STRING
;

'''

我的数据文件中有一行是“bag hello soda this way”。希望看到结果具有 keyword='bag' name='hello soda' 和 command='this way'.

的属性

不确定如何处理语法：关键字 words keywords 确保 2nd 关键字不包含在单词中。另一种表达方式是startwords words commands

Answer 1

如果我理解你的目标，你可以这样做：

from textx import metamodel_from_str

mm = metamodel_from_str('''
File:
    lines+=Line;

Line:
    start=StartWord
    words+=Word
    command=Command;

StartWord:
    'bag' | 'something';

Command:
    'this way' | 'that way';

Word:
    !Command ID;
''')

input = '''
bag hello soda this way
bag hello soda that way
something hello this foo this way
'''

model = mm.model_from_str(input)

assert len(model.lines) == 3
l = model.lines[1]
assert l.start == 'bag'
assert l.words == ['hello', 'soda']
assert l.command == 'that way'

有几点需要注意：

您不必在重复中指定 [' '] 作为分隔符规则，因为默认情况下白色 space 会被跳过，
要指定备选方案，请使用 |、
您可以使用句法谓词 ! 来检查是否有东西在前面，只有在没有的时候才继续。在规则 Word 中，这用于确保命令不会被 Line 规则中的 Word 重复消耗。
您只需为这些规则添加更多替代项，即可添加更多起始词和命令，
如果您想要更宽容并捕获命令，即使用户在命令词之间指定了多个白色spaces（例如this way），您可以使用正则表达式匹配或例如指定匹配项：

Command:
    'this ' 'way' | 'that ' 'way';

将匹配单个 space 作为 this 的一部分，并且比 way 之前任意数量的白色 space 将被丢弃。

在 the textX site 上有一个包含示例的综合文档，所以我建议您看一下并浏览一些提供的示例。

如何从一行文本中解析关键字和字符串

How to parse keywords and strings from a line of text

textx