为什么 TextX 忽略字符串文字中的 \n,但不忽略正则表达式?
Why does TextX ignore \n in string literal, but not in regex?
TL;DR: issue 将在 TextX 3.0 版中修复。解决方法是使用正则表达式匹配转义 (\
) 字符,例如 \n
.
完整问题: 使用 TextX,我正在解析一种本土标记语言,其中段落和换行符很重要。我想我在尝试匹配新行时缺少基本的理解:为什么 "\n"
和 "\n\n"
不起作用,而它们的正则表达式对应物 /\n/
和 /\n\n/
起作用?
注意:使用 ws=" \t"
.
在解析器级别重新定义空格以排除 \n
import textx as tx
grammar = r"""
Root:
content*=Content
;
Content:
Text | ParagraphBreak | LineBreak
;
ParagraphBreak:
paragraphbreak="\n\n"
// paragraphbreak=/\n\n/
;
LineBreak:
linebreak="\n" // Will cause parsing error
// linebreak=/\n/ // Will parse fine
;
Text[noskipws]: // All text valid
text=/[^\n]*/
;
"""
parser = tx.metamodel_from_str(grammar, ws=" \t")
source = "Line.\nBreak.\n\n"
parsed_source = parser.model_from_str(source)
print(parsed_source.content)
当运行以上代码在我的系统上,使用
- Python 3.10.1
- 诗歌1.1.12版本,来自poetry.lock:
- [[package]] name = "arpeggio", version = "1.10.2", ..., python-versions = "*"
- [[package]] name = "textx", version = "2.3.0", ..., python-versions = "*", [package.dependencies] Arpeggio = " >=1.9.0
我得到以下结果:
路径的根:/Users/[redacted]/Library/Caches/pypoetry/virtualenvs
.
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 291, in _parse
return self.parser_model.parse(self)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 945, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 485, in _parse
result = p(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 423, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 409, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 898, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 409, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 898, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
arpeggio.NoMatch: Expected '\n\n' or '\n' or EOF at position (1, 6) => 'Line.* Break. '.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/[redacted]/scratchpad/TextX/linebreaks.py", line 31, in <module>
parsed_source = parser.model_from_str(source)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/metamodel.py", line 615, in model_from_str
model = self._parser_blueprint.clone().get_model_from_str(
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 332, in get_model_from_str
self.parse(model_str, file_name=file_name)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1516, in parse
self.parse_tree = self._parse()
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 294, in _parse
raise TextXSyntaxError(message=text(e),
textx.exceptions.TextXSyntaxError: None:1:6: error: Expected '\n\n' or '\n' or EOF at position (1, 6) => 'Line.* Break. '.
我期待与正则表达式版本相同的结果,即:
[<textx:Text instance at 0x10129bc40>, <textx:LineBreak instance at 0x101298040>, <textx:Text instance at 0x101298130>, <textx:ParagraphBreak instance at 0x10129aec0>]
是当前开发版本解决的问题。请参阅 this textX issue。
此修复将包含在即将发布的 textX 3.0 版本中。
TL;DR: issue 将在 TextX 3.0 版中修复。解决方法是使用正则表达式匹配转义 (\
) 字符,例如 \n
.
完整问题: 使用 TextX,我正在解析一种本土标记语言,其中段落和换行符很重要。我想我在尝试匹配新行时缺少基本的理解:为什么 "\n"
和 "\n\n"
不起作用,而它们的正则表达式对应物 /\n/
和 /\n\n/
起作用?
注意:使用 ws=" \t"
.
\n
import textx as tx
grammar = r"""
Root:
content*=Content
;
Content:
Text | ParagraphBreak | LineBreak
;
ParagraphBreak:
paragraphbreak="\n\n"
// paragraphbreak=/\n\n/
;
LineBreak:
linebreak="\n" // Will cause parsing error
// linebreak=/\n/ // Will parse fine
;
Text[noskipws]: // All text valid
text=/[^\n]*/
;
"""
parser = tx.metamodel_from_str(grammar, ws=" \t")
source = "Line.\nBreak.\n\n"
parsed_source = parser.model_from_str(source)
print(parsed_source.content)
当运行以上代码在我的系统上,使用
- Python 3.10.1
- 诗歌1.1.12版本,来自poetry.lock:
- [[package]] name = "arpeggio", version = "1.10.2", ..., python-versions = "*"
- [[package]] name = "textx", version = "2.3.0", ..., python-versions = "*", [package.dependencies] Arpeggio = " >=1.9.0
我得到以下结果:
路径的根:/Users/[redacted]/Library/Caches/pypoetry/virtualenvs
.
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 291, in _parse
return self.parser_model.parse(self)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 945, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 485, in _parse
result = p(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 423, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 409, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 898, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 409, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 291, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 370, in _parse
result = e.parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 789, in parse
result = self._parse(parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 898, in _parse
parser._nm_raise(self, c_pos, parser)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1718, in _nm_raise
raise self.nm
arpeggio.NoMatch: Expected '\n\n' or '\n' or EOF at position (1, 6) => 'Line.* Break. '.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/[redacted]/scratchpad/TextX/linebreaks.py", line 31, in <module>
parsed_source = parser.model_from_str(source)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/metamodel.py", line 615, in model_from_str
model = self._parser_blueprint.clone().get_model_from_str(
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 332, in get_model_from_str
self.parse(model_str, file_name=file_name)
File ".../[redacted]-py3.10/lib/python3.10/site-packages/arpeggio/__init__.py", line 1516, in parse
self.parse_tree = self._parse()
File ".../[redacted]-py3.10/lib/python3.10/site-packages/textx/model.py", line 294, in _parse
raise TextXSyntaxError(message=text(e),
textx.exceptions.TextXSyntaxError: None:1:6: error: Expected '\n\n' or '\n' or EOF at position (1, 6) => 'Line.* Break. '.
我期待与正则表达式版本相同的结果,即:
[<textx:Text instance at 0x10129bc40>, <textx:LineBreak instance at 0x101298040>, <textx:Text instance at 0x101298130>, <textx:ParagraphBreak instance at 0x10129aec0>]
是当前开发版本解决的问题。请参阅 this textX issue。
此修复将包含在即将发布的 textX 3.0 版本中。