在 pyparsing 中使用 QuotedString
Using QuotedString in pyparsing
我在理解如何构建 pyparsing
解析器时遇到了概念上的困难。步骤是:1) 通过组合 ParserElement 的 subclasses 构建解析器,以及 2) 使用解析器解析字符串。
以下示例运行良好:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString
name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))
产生预期的标记序列:
['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']
但是(例如)class QuotedString 不将 ParserElement 作为参数,因此它不能用于构建解析器。我希望在上面的示例中使用它,例如:
name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name) ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)
解析以下形式的文档:
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
但是由于不能那样使用,在构建引用字符串列表的解析器时包含 QuotedString 的正确语法是什么?
========编辑============
查看下面的答案...
我认为您只是对如何使用 QuotedString 有点困惑。传递给 QuotedString 的参数是 而不是 引号内预期的字符串 - 它是用作引号字符的字符。通过这种方式,您可以定义带引号的字符串,这些字符串使用“*”作为引号,或“=”作为引号,或“<”和“>”开始和结束引号字符。在您的示例中,只需对字段使用此定义:
field = QuotedString('"')
此外,不要害怕使用 python 的内置 help() 方法来访问 类、模块、方法等的文档字符串
编辑:
QuotedString('X')
不 解析 "X"
,它解析 X some characters inside matching characters X
.
这是您的完整(工作)示例程序:
from pyparsing import QuotedString, delimitedList, Group
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
field = QuotedString('"')
parser = "<Begin>" + Group(delimitedList(field)) + "**End**"
print(parser.parseString(dstring))
我打印的是:
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
如果您在 copy/pasting 这个示例和 运行 它之后得到异常,请 post 完整的异常。
更多示例:
starQuoteString = QuotedString('*')
eqQuoteString = QuotedString('=')
tildeQuoteString = QuotedString('~')
angleQuoteString = QuotedString('<', endQuoteChar='>')
fullSample = starQuoteString + eqQuoteString + tildeQuoteString + angleQuoteString
print fullSample.parseString("""
*a string quoted with stars*
=a very long quoted string, contained within equal signs=
~not a very long string at all~<another quoted string on the same line>
""")
打印:
['a string quoted with stars', 'a very long quoted string, contained within equal signs', 'not a very long string at all', 'another quoted string on the same line']
您甚至不限于单个字符。您可以使用 QuotedString('**')
来解析您的结尾 **End**
,但是这也会接受 **The End**
、或 **Finis**
、或 **That's all folks!**
.
QuotedString 不能用于此任务。但是 OR 函数可以实现相同的效果 - 允许不同形式的引号,同时保留解析引号中包含的字符串的有效性的能力。以下代码执行此操作:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList
from pyparsing import Group, QuotedString, ParseException, Suppress
name = Word(alphas+"_", alphanums+"_")
field = Suppress('"') + name + Suppress('"') ^ \ # double quote
Suppress("'") + name + Suppress("'") ^ \ # single quote
Suppress("<") + name + Suppress(">") ^ \ # html tag
Suppress("{{")+ name + Suppress("}}") # django template variable
fieldlist = Group(delimitedList(field))
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = [
'<Begin>"abc","de34","f_o_o"**End**', # Good
'<Begin><abc>,{{de34}},\'f_o_o\'**End**', # Good
'<Begin>"abc",\'de34","f_o_o\'**End**', # Bad - mismatched quotes
'<Begin>"abc","de34","f_o#o"**End**', # Bad - invalid identifier
]
for ds in dstring:
print(ds)
try:
print(' ', doc.parseString(ds))
except ParseException as err:
print(" "*(err.column-1) + "^")
print(err)
这会产生所需的输出,接受两个好的测试字符串并拒绝两个坏的:
<Begin>"abc","de34","f_o_o"**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin><abc>,{{de34}},'f_o_o'**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin>"abc",'de34","f_o_o'**End**
^
Expected "**End**" (at char 12), (line:1, col:13)
<Begin>"abc","de34","f_o#o"**End**
^
Expected "**End**" (at char 19), (line:1, col:20)
感谢 Paul 提供的所有帮助以及制作如此出色的软件包。
我在理解如何构建 pyparsing
解析器时遇到了概念上的困难。步骤是:1) 通过组合 ParserElement 的 subclasses 构建解析器,以及 2) 使用解析器解析字符串。
以下示例运行良好:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString
name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))
产生预期的标记序列:
['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']
但是(例如)class QuotedString 不将 ParserElement 作为参数,因此它不能用于构建解析器。我希望在上面的示例中使用它,例如:
name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name) ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)
解析以下形式的文档:
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
但是由于不能那样使用,在构建引用字符串列表的解析器时包含 QuotedString 的正确语法是什么?
========编辑============
查看下面的答案...
我认为您只是对如何使用 QuotedString 有点困惑。传递给 QuotedString 的参数是 而不是 引号内预期的字符串 - 它是用作引号字符的字符。通过这种方式,您可以定义带引号的字符串,这些字符串使用“*”作为引号,或“=”作为引号,或“<”和“>”开始和结束引号字符。在您的示例中,只需对字段使用此定义:
field = QuotedString('"')
此外,不要害怕使用 python 的内置 help() 方法来访问 类、模块、方法等的文档字符串
编辑:
QuotedString('X')
不 解析 "X"
,它解析 X some characters inside matching characters X
.
这是您的完整(工作)示例程序:
from pyparsing import QuotedString, delimitedList, Group
dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
field = QuotedString('"')
parser = "<Begin>" + Group(delimitedList(field)) + "**End**"
print(parser.parseString(dstring))
我打印的是:
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
如果您在 copy/pasting 这个示例和 运行 它之后得到异常,请 post 完整的异常。
更多示例:
starQuoteString = QuotedString('*')
eqQuoteString = QuotedString('=')
tildeQuoteString = QuotedString('~')
angleQuoteString = QuotedString('<', endQuoteChar='>')
fullSample = starQuoteString + eqQuoteString + tildeQuoteString + angleQuoteString
print fullSample.parseString("""
*a string quoted with stars*
=a very long quoted string, contained within equal signs=
~not a very long string at all~<another quoted string on the same line>
""")
打印:
['a string quoted with stars', 'a very long quoted string, contained within equal signs', 'not a very long string at all', 'another quoted string on the same line']
您甚至不限于单个字符。您可以使用 QuotedString('**')
来解析您的结尾 **End**
,但是这也会接受 **The End**
、或 **Finis**
、或 **That's all folks!**
.
QuotedString 不能用于此任务。但是 OR 函数可以实现相同的效果 - 允许不同形式的引号,同时保留解析引号中包含的字符串的有效性的能力。以下代码执行此操作:
from pyparsing import Word, Literal, alphas, alphanums, delimitedList
from pyparsing import Group, QuotedString, ParseException, Suppress
name = Word(alphas+"_", alphanums+"_")
field = Suppress('"') + name + Suppress('"') ^ \ # double quote
Suppress("'") + name + Suppress("'") ^ \ # single quote
Suppress("<") + name + Suppress(">") ^ \ # html tag
Suppress("{{")+ name + Suppress("}}") # django template variable
fieldlist = Group(delimitedList(field))
doc = Literal('<Begin>') + fieldlist + Literal('**End**')
dstring = [
'<Begin>"abc","de34","f_o_o"**End**', # Good
'<Begin><abc>,{{de34}},\'f_o_o\'**End**', # Good
'<Begin>"abc",\'de34","f_o_o\'**End**', # Bad - mismatched quotes
'<Begin>"abc","de34","f_o#o"**End**', # Bad - invalid identifier
]
for ds in dstring:
print(ds)
try:
print(' ', doc.parseString(ds))
except ParseException as err:
print(" "*(err.column-1) + "^")
print(err)
这会产生所需的输出,接受两个好的测试字符串并拒绝两个坏的:
<Begin>"abc","de34","f_o_o"**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin><abc>,{{de34}},'f_o_o'**End**
['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin>"abc",'de34","f_o_o'**End**
^
Expected "**End**" (at char 12), (line:1, col:13)
<Begin>"abc","de34","f_o#o"**End**
^
Expected "**End**" (at char 19), (line:1, col:20)
感谢 Paul 提供的所有帮助以及制作如此出色的软件包。