pyPEG - 由 `flag()` 函数标识的数据被 `compose()` 函数错误地返回
pyPEG - data identified by a `flag()` function are returned incorrectly by `compose()` function
我处于需要解析旧格式的情况。我想做的是编写一个识别格式的解析器并将其转换为更易于使用的对象。
我设法解析了输入,问题是当我想将它转换回字符串时。总结一下:当我将 parse()
的结果作为参数传递给 compose()
方法时,它 return 不是正确的字符串。
这是输出和源代码。关于挂钩,我是初学者,我有什么误解吗?请注意,我的初始字符串中有 (126000-147600,3);
,而在组合字符串中,它前面带有 -
。
输出:
********************************************************************************
-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
********************************************************************************
gmt+1 GB_EN
********************************************************************************
[{'end': '61200', 'interval': '0', 'start': '39600'},
{'end': '147600', 'interval': '3', 'start': '126000'},
{'end': '234000', 'interval': '5', 'inverted': True, 'start': '212400'},
{'start': '298800'},
{'start': '320400'},
{'end': '406800', 'interval': '0', 'start': '385200'},
{'end': '493200', 'interval': '0', 'start': '471600'},
{'end': '579600', 'interval': '0', 'start': '558000'}]
-t gmt+1 -n GB_EN -p '39600-61200,0; -(126000-147600,3); -(212400-234000,5); 298800; -(320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
Python源代码:
from pypeg2 import *
from pprint import pprint
Timezone = re.compile(r"(?i)gmt[\+\-]\d")
TimeValue = re.compile(r"[\d]+")
class ObjectSerializerMixin(object):
def get_as_object(self):
obj = {}
for attr in ['start', 'end', 'interval', 'inverted']:
if getattr(self, attr, None):
obj[attr] = getattr(self, attr)
return obj
class TimeFixed(str, ObjectSerializerMixin):
grammar = attr('start', TimeValue)
class TimePeriod(Namespace, ObjectSerializerMixin):
grammar = attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue)
class TimePeriodWrapped(Namespace, ObjectSerializerMixin):
grammar = flag("inverted", '-'), "(", attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue), ")"
class TimeFixedWrapped(Namespace, ObjectSerializerMixin):
grammar = flag("inverted", '-'), "(", attr('start', TimeValue), ")"
class TimeList(List):
grammar = csl([TimePeriod, TimeFixed, TimePeriodWrapped, TimeFixedWrapped], separator=";")
def __str__(self):
for a in self:
print(a.get_as_object())
return ''
class AlertExpression(List):
grammar = '-t', blank, attr('timezone', Timezone), blank, '-n', blank, attr('locale'), blank, "-p", optional(blank), "'", attr('timelist', TimeList), "'"
def get_time_objects(self):
for item in self.timelist:
yield item.get_as_object()
def __str__(self):
return '{} {}'.format(self.timezone, self.locale)
if __name__ == '__main__':
s="""-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'"""
p = parse(s, AlertExpression)
print("*"*80)
print(s)
print("*"*80)
print(p)
print("*"*80)
pprint(list(p.get_time_objects()))
print(compose(p))
我很确定这是 pypeg2
中的错误
您可以使用 pypeg2 示例的简化版本验证这一点 given here 但使用的值与您正在使用的值相似:
>>>from pypeg2 import *
>>> class AddNegation:
... grammar = flag("inverted",'-'), blank, "(1000-5000,3)"
...
>>> t = AddNegation()
>>> t.inverted = False
>>> compose(t)
'- (1000-5000,3)'
>>> t.inverted = True
>>> compose(t)
'- (1000-5000,3)'
这用一个最小的例子证明了标志变量(inverted
)的值对组合没有影响。正如您自己发现的那样,您的 parse
正在如您所愿地工作。
我快速浏览了代码 this is where the compose is. The module is all written within the one __init__.py
file and this function is recursive. As far as I can tell, the problem is that when the flag is False, the -
object is still passed into compose (at the bottom level of recursion) as a str
type and simply added into the composed string here。
Update 将 bug 隔离到 this line (1406),它错误地解包标志属性并将发送字符串'-'
回到 compose()
并将其附加到 属性 的任何值,其类型为 bool
.
部分解决方法是用 text.append(self.compose(thing, g))
替换该行,类似于上面的子句(因此 Attribute
类型被视为与它们从元组中解包后的普通方式相同),但是您随后点击了 this bug,其中可选属性(标志只是类型 Attribute
的一种特殊情况)在对象中缺失的地方没有正确组合。
作为 that 的解决方法,您可以转到同一文件的第 1350 行并替换
if grammar.subtype == "Flag":
if getattr(thing, grammar.name):
result = self.compose(thing, grammar.thing, attr_of=thing)
else:
result = terminal_indent()
与
if grammar.subtype == "Flag":
try:
if getattr(thing, grammar.name):
result = self.compose(thing, grammar.thing, attr_of=thing)
else:
result = terminal_indent()
except AttributeError:
#if attribute error missing, insert nothing
result = terminal_indent()
我不确定这是一个完全可靠的修复程序,但它是一个能让你继续前进的解决方法
输出
将这两个解决方法/修复应用于 pypeg2
模块文件后,您从 print(compose(p))
获得的输出是
-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
根据需要,您可以继续使用 pypeg2
模块。
我处于需要解析旧格式的情况。我想做的是编写一个识别格式的解析器并将其转换为更易于使用的对象。
我设法解析了输入,问题是当我想将它转换回字符串时。总结一下:当我将 parse()
的结果作为参数传递给 compose()
方法时,它 return 不是正确的字符串。
这是输出和源代码。关于挂钩,我是初学者,我有什么误解吗?请注意,我的初始字符串中有 (126000-147600,3);
,而在组合字符串中,它前面带有 -
。
输出:
********************************************************************************
-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
********************************************************************************
gmt+1 GB_EN
********************************************************************************
[{'end': '61200', 'interval': '0', 'start': '39600'},
{'end': '147600', 'interval': '3', 'start': '126000'},
{'end': '234000', 'interval': '5', 'inverted': True, 'start': '212400'},
{'start': '298800'},
{'start': '320400'},
{'end': '406800', 'interval': '0', 'start': '385200'},
{'end': '493200', 'interval': '0', 'start': '471600'},
{'end': '579600', 'interval': '0', 'start': '558000'}]
-t gmt+1 -n GB_EN -p '39600-61200,0; -(126000-147600,3); -(212400-234000,5); 298800; -(320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
Python源代码:
from pypeg2 import *
from pprint import pprint
Timezone = re.compile(r"(?i)gmt[\+\-]\d")
TimeValue = re.compile(r"[\d]+")
class ObjectSerializerMixin(object):
def get_as_object(self):
obj = {}
for attr in ['start', 'end', 'interval', 'inverted']:
if getattr(self, attr, None):
obj[attr] = getattr(self, attr)
return obj
class TimeFixed(str, ObjectSerializerMixin):
grammar = attr('start', TimeValue)
class TimePeriod(Namespace, ObjectSerializerMixin):
grammar = attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue)
class TimePeriodWrapped(Namespace, ObjectSerializerMixin):
grammar = flag("inverted", '-'), "(", attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue), ")"
class TimeFixedWrapped(Namespace, ObjectSerializerMixin):
grammar = flag("inverted", '-'), "(", attr('start', TimeValue), ")"
class TimeList(List):
grammar = csl([TimePeriod, TimeFixed, TimePeriodWrapped, TimeFixedWrapped], separator=";")
def __str__(self):
for a in self:
print(a.get_as_object())
return ''
class AlertExpression(List):
grammar = '-t', blank, attr('timezone', Timezone), blank, '-n', blank, attr('locale'), blank, "-p", optional(blank), "'", attr('timelist', TimeList), "'"
def get_time_objects(self):
for item in self.timelist:
yield item.get_as_object()
def __str__(self):
return '{} {}'.format(self.timezone, self.locale)
if __name__ == '__main__':
s="""-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'"""
p = parse(s, AlertExpression)
print("*"*80)
print(s)
print("*"*80)
print(p)
print("*"*80)
pprint(list(p.get_time_objects()))
print(compose(p))
我很确定这是 pypeg2
您可以使用 pypeg2 示例的简化版本验证这一点 given here 但使用的值与您正在使用的值相似:
>>>from pypeg2 import *
>>> class AddNegation:
... grammar = flag("inverted",'-'), blank, "(1000-5000,3)"
...
>>> t = AddNegation()
>>> t.inverted = False
>>> compose(t)
'- (1000-5000,3)'
>>> t.inverted = True
>>> compose(t)
'- (1000-5000,3)'
这用一个最小的例子证明了标志变量(inverted
)的值对组合没有影响。正如您自己发现的那样,您的 parse
正在如您所愿地工作。
我快速浏览了代码 this is where the compose is. The module is all written within the one __init__.py
file and this function is recursive. As far as I can tell, the problem is that when the flag is False, the -
object is still passed into compose (at the bottom level of recursion) as a str
type and simply added into the composed string here。
Update 将 bug 隔离到 this line (1406),它错误地解包标志属性并将发送字符串'-'
回到 compose()
并将其附加到 属性 的任何值,其类型为 bool
.
部分解决方法是用 text.append(self.compose(thing, g))
替换该行,类似于上面的子句(因此 Attribute
类型被视为与它们从元组中解包后的普通方式相同),但是您随后点击了 this bug,其中可选属性(标志只是类型 Attribute
的一种特殊情况)在对象中缺失的地方没有正确组合。
作为 that 的解决方法,您可以转到同一文件的第 1350 行并替换
if grammar.subtype == "Flag":
if getattr(thing, grammar.name):
result = self.compose(thing, grammar.thing, attr_of=thing)
else:
result = terminal_indent()
与
if grammar.subtype == "Flag":
try:
if getattr(thing, grammar.name):
result = self.compose(thing, grammar.thing, attr_of=thing)
else:
result = terminal_indent()
except AttributeError:
#if attribute error missing, insert nothing
result = terminal_indent()
我不确定这是一个完全可靠的修复程序,但它是一个能让你继续前进的解决方法
输出
将这两个解决方法/修复应用于 pypeg2
模块文件后,您从 print(compose(p))
获得的输出是
-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
根据需要,您可以继续使用 pypeg2
模块。