Pyparsing写入外部文件
Pyparsing Write in External File
更新(我得到了更多信息...)
所以我的目标是为一个奇怪的 XML 相似但不是 XML 格式的脚本编写解析器。
<[file][][]
<[cultivation][][]
<[string8][coordinate_system][lonlat]>
<[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]
>
<[string8][buildings_texture_folder][]>
<[list_plant][plant_list][]
>
<[list_building][building_list][]
<[building][element][0]
<[vector3_float64][position][7.809637 46.182262 0]>
<[float32][direction][-1.82264196872711]>
<[float32][length][25.9434452056885]>
<[float32][width][17.4678573608398]>
<[int32][floors][3]>
<[stringt8c][roof][gable]>
<[stringt8c][usage][residential]>
> ...
到目前为止我得到了这个:
def toc_parser(file_path):
# save complete file in variable
f = open(file_path, "r")
toc = f.read()
parser = OneOrMore(Word(alphas))
# exclude kommis
parser.ignore('//' + pp.restOfLine())
#exclude <>
klammern = Suppress("<")
klammernzu = Suppress(">")
eckig = Suppress("[")
eckigzu = Suppress("]")
element = Suppress("[element]")
leer = Suppress("[]")
#grammar:
nameBuilding = "building"
namePosition = "position"
nameDirection = "direction"
nameLength = "length"
nameWidth = "width"
nameFloors = "floors"
nameRoof = "roof"
nameUsage = "usage"
buildingzahl = klammern + eckig + nameBuilding + eckigzu + element +eckig + Word(nums) +eckigzu
pos = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + namePosition + eckigzu + eckig + Combine(Word(nums)+"."+Word(nums))+ Combine(Word(nums)+"."+Word(nums))+ Word(nums)+ eckigzu + klammernzu
direc = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameDirection + eckigzu + eckig + Combine(Optional("-")+Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
leng = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameLength + eckigzu+eckig + Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
widt = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameWidth + eckigzu+eckig+Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
floors = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameFloors + eckigzu+eckig+Word(nums)+ eckigzu + klammernzu
roof = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameRoof + eckigzu +eckig+Word(alphas)+ eckigzu + klammernzu
usag = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameUsage+ eckigzu+eckig+Word(alphas)+ eckigzu + klammernzu
building = buildingzahl + pos +direc +leng + widt + floors + roof + usag + klammernzu
file = klammern + eckig + Literal("file") + eckigzu + leer + leer + klammern + eckig+ Literal("cultivation") +eckigzu + leer + leer
vegexcl = Literal("<[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]") + klammernzu
coordsis = Literal("<[string8][coordinate_system][lonlat]>")
textures = Literal("<[string8][buildings_texture_folder][]>")
listPlants = Literal("<[list_plant][plant_list][]") + klammernzu
listBuildings = Literal("<[list_building][building_list][]") + OneOrMore(building) + klammernzu
listLights = Literal("<[list_light][light_list][]") + klammernzu
listAirportLights = Literal("<[list_airport_light][airport_light_list][]") + klammernzu
listXref = Literal("<[list_xref][xref_list][]") + klammernzu
fileganz = file + coordsis + vegexcl + textures + listPlants + listBuildings + listLights + listAirportLights + listXref + klammernzu + klammernzu
print(fileganz.parseString(toc))
问题:
我需要能够覆盖外部脚本中的某些值并弄清楚(here)这就是你如何做的,但它总是输入“else”
#define Values to be updated
valuesToUpdate = {
"building":"home"
""
}
def updateSelectedDefinitions(tokens):
if tokens.name in valuesToUpdate:
newVal = valuesToUpdate[tokens.name]
return "%" % tokens.name, newVal
else:
raise ParseException(print("no Update definded"))
非常感谢您的帮助:)
这是一个快速 运行 通过。
首先,我们应该尝试用文字来描述这种格式:
"每个条目都包含在'<>'字符中,并在'[]'字符中包含3个值,后跟零个或多个嵌套条目。'[]'中的3个值包含数据类型,一个可选的名称和一个可选的值。这些值可以是数字或字符串,并且可以根据数据类型解析为标量或列表值。"
将其转换为准 BNF,其中“*”用于“零个或多个”:
entry ::= '<' subentry subentry subentry entry* '>'
subentry ::= '[' value* ']'
value ::= number | alphanumeric word
我们可以看出这是一个递归文法,因为 entry
可以包含也是 entry
的元素。因此,当我们转换为 pyparsing 时,我们将使用 pyparsing Forward
将 entry
定义为占位符,然后在定义所有其他表达式后定义其结构。
将这个简短的 BNF 转换为 pyparsing:
# define some basic punctuation - useful at parse time, but we will
# suppress them since we don't really need them after parsing is done
# (we'll use pyparsing Groups to capture the structure that these
# characters represent)
LT, GT, LBRACK, RBRACK = map(pp.Suppress, "<>[]")
# define our placeholder for the nested entry
entry = pp.Forward()
# work bottom-up through the BNF
value = pp.pyparsing_common.number | pp.Word(pp.alphas, pp.alphanums+"_")
subentry = pp.Group(LBRACK - value[...] + RBRACK)
type_name_value = subentry*3
entry <<= pp.Group(LT
- type_name_value("type_name_value")
+ pp.Group(entry[...])("contents") + GT)
此时,您可以使用 entry 来解析您的示例文本(在添加足够多的结束 '> 使其成为有效的嵌套表达式之后):
result = entry.parseString(sample)
result.pprint()
打印:
[[['file'],
[],
[],
[[['cultivation'],
[],
[],
[[['string8'], ['coordinate_system'], ['lonlat'], []],
[['list_vegetation_map_exclusion_zone'],
['vegetation_map_exclusion_zone_list'],
[],
[]],
[['string8'], ['buildings_texture_folder'], [], []],
[['list_plant'], ['plant_list'], [], []],
[['list_building'],
['building_list'],
[],
[[['building'],
['element'],
[0],
[[['vector3_float64'], ['position'], [7.809637, 46.182262, 0], []],
[['float32'], ['direction'], [-1.82264196872711], []],
[['float32'], ['length'], [25.9434452056885], []],
[['float32'], ['width'], [17.4678573608398], []],
[['int32'], ['floors'], [3], []],
[['stringt8c'], ['roof'], ['gable'], []],
[['stringt8c'], ['usage'], ['residential'], []]]]]]]]]]]
所以这是一个开始。我们可以看到值被解析,值被解析为正确的类型。
为了将这些部分转换成更连贯的结构,我们可以将解析操作附加到 entry
,这将是每个 entry
被解析时的解析时回调。
在这种情况下,我们将编写一个解析操作来处理 type/name/value 三元组,然后捕获嵌套内容(如果存在)。我们将尝试从数据类型字符串中推断出如何构造值或内容。
def convert_entry_to_dict(tokens):
# entry is wrapped in a Group, so ungroup to get the parsed elements
parsed = tokens[0]
# unpack data type, optional name and optional value
data_type, name, value = parsed.type_name_value
data_type = data_type[0] if data_type else None
name = name[0] if name else None
# save type and name in dict to be returned from the parse action
ret = {'type': data_type, 'name': name}
# if there were contents present, save them as the value; otherwise,
# get the value from the third element in the triple (use the
# parsed data type as a hint as to whether the value should be a
# scalar, a list, or a str)
if parsed.contents:
ret["value"] = list(parsed.contents)
else:
if data_type.startswith(("vector", "list")):
ret["value"] = [*value]
else:
ret["value"] = value[0] if value else None
if ret["value"] is None and data_type.startswith("string"):
ret["value"] = ""
return ret
entry.addParseAction(convert_entry_to_dict)
现在当我们解析样本时,我们得到这个结构:
[{'name': None,
'type': 'file',
'value': [{'name': None,
'type': 'cultivation',
'value': [{'name': 'coordinate_system',
'type': 'string8',
'value': 'lonlat'},
{'name': 'vegetation_map_exclusion_zone_list',
'type': 'list_vegetation_map_exclusion_zone',
'value': []},
{'name': 'buildings_texture_folder',
'type': 'string8',
'value': ''},
{'name': 'plant_list',
'type': 'list_plant',
'value': []},
{'name': 'building_list',
'type': 'list_building',
'value': [{'name': 'element',
'type': 'building',
'value': [{'name': 'position',
'type': 'vector3_float64',
'value': [7.809637,
46.182262,
0]},
{'name': 'direction',
'type': 'float32',
'value': -1.82264196872711},
{'name': 'length',
'type': 'float32',
'value': 25.9434452056885},
{'name': 'width',
'type': 'float32',
'value': 17.4678573608398},
{'name': 'floors',
'type': 'int32',
'value': 3},
{'name': 'roof',
'type': 'stringt8c',
'value': 'gable'},
{'name': 'usage',
'type': 'stringt8c',
'value': 'residential'}]}]}]}]}]
如果您需要重命名任何字段名称,您可以在解析操作中添加该行为。
这应该会给您一个处理标记的良好开端。
更新(我得到了更多信息...)
所以我的目标是为一个奇怪的 XML 相似但不是 XML 格式的脚本编写解析器。
<[file][][]
<[cultivation][][]
<[string8][coordinate_system][lonlat]>
<[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]
>
<[string8][buildings_texture_folder][]>
<[list_plant][plant_list][]
>
<[list_building][building_list][]
<[building][element][0]
<[vector3_float64][position][7.809637 46.182262 0]>
<[float32][direction][-1.82264196872711]>
<[float32][length][25.9434452056885]>
<[float32][width][17.4678573608398]>
<[int32][floors][3]>
<[stringt8c][roof][gable]>
<[stringt8c][usage][residential]>
> ...
到目前为止我得到了这个:
def toc_parser(file_path):
# save complete file in variable
f = open(file_path, "r")
toc = f.read()
parser = OneOrMore(Word(alphas))
# exclude kommis
parser.ignore('//' + pp.restOfLine())
#exclude <>
klammern = Suppress("<")
klammernzu = Suppress(">")
eckig = Suppress("[")
eckigzu = Suppress("]")
element = Suppress("[element]")
leer = Suppress("[]")
#grammar:
nameBuilding = "building"
namePosition = "position"
nameDirection = "direction"
nameLength = "length"
nameWidth = "width"
nameFloors = "floors"
nameRoof = "roof"
nameUsage = "usage"
buildingzahl = klammern + eckig + nameBuilding + eckigzu + element +eckig + Word(nums) +eckigzu
pos = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + namePosition + eckigzu + eckig + Combine(Word(nums)+"."+Word(nums))+ Combine(Word(nums)+"."+Word(nums))+ Word(nums)+ eckigzu + klammernzu
direc = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameDirection + eckigzu + eckig + Combine(Optional("-")+Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
leng = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameLength + eckigzu+eckig + Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
widt = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameWidth + eckigzu+eckig+Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
floors = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameFloors + eckigzu+eckig+Word(nums)+ eckigzu + klammernzu
roof = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameRoof + eckigzu +eckig+Word(alphas)+ eckigzu + klammernzu
usag = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameUsage+ eckigzu+eckig+Word(alphas)+ eckigzu + klammernzu
building = buildingzahl + pos +direc +leng + widt + floors + roof + usag + klammernzu
file = klammern + eckig + Literal("file") + eckigzu + leer + leer + klammern + eckig+ Literal("cultivation") +eckigzu + leer + leer
vegexcl = Literal("<[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]") + klammernzu
coordsis = Literal("<[string8][coordinate_system][lonlat]>")
textures = Literal("<[string8][buildings_texture_folder][]>")
listPlants = Literal("<[list_plant][plant_list][]") + klammernzu
listBuildings = Literal("<[list_building][building_list][]") + OneOrMore(building) + klammernzu
listLights = Literal("<[list_light][light_list][]") + klammernzu
listAirportLights = Literal("<[list_airport_light][airport_light_list][]") + klammernzu
listXref = Literal("<[list_xref][xref_list][]") + klammernzu
fileganz = file + coordsis + vegexcl + textures + listPlants + listBuildings + listLights + listAirportLights + listXref + klammernzu + klammernzu
print(fileganz.parseString(toc))
问题:
我需要能够覆盖外部脚本中的某些值并弄清楚(here)这就是你如何做的,但它总是输入“else”
#define Values to be updated
valuesToUpdate = {
"building":"home"
""
}
def updateSelectedDefinitions(tokens):
if tokens.name in valuesToUpdate:
newVal = valuesToUpdate[tokens.name]
return "%" % tokens.name, newVal
else:
raise ParseException(print("no Update definded"))
非常感谢您的帮助:)
这是一个快速 运行 通过。
首先,我们应该尝试用文字来描述这种格式:
"每个条目都包含在'<>'字符中,并在'[]'字符中包含3个值,后跟零个或多个嵌套条目。'[]'中的3个值包含数据类型,一个可选的名称和一个可选的值。这些值可以是数字或字符串,并且可以根据数据类型解析为标量或列表值。"
将其转换为准 BNF,其中“*”用于“零个或多个”:
entry ::= '<' subentry subentry subentry entry* '>'
subentry ::= '[' value* ']'
value ::= number | alphanumeric word
我们可以看出这是一个递归文法,因为 entry
可以包含也是 entry
的元素。因此,当我们转换为 pyparsing 时,我们将使用 pyparsing Forward
将 entry
定义为占位符,然后在定义所有其他表达式后定义其结构。
将这个简短的 BNF 转换为 pyparsing:
# define some basic punctuation - useful at parse time, but we will
# suppress them since we don't really need them after parsing is done
# (we'll use pyparsing Groups to capture the structure that these
# characters represent)
LT, GT, LBRACK, RBRACK = map(pp.Suppress, "<>[]")
# define our placeholder for the nested entry
entry = pp.Forward()
# work bottom-up through the BNF
value = pp.pyparsing_common.number | pp.Word(pp.alphas, pp.alphanums+"_")
subentry = pp.Group(LBRACK - value[...] + RBRACK)
type_name_value = subentry*3
entry <<= pp.Group(LT
- type_name_value("type_name_value")
+ pp.Group(entry[...])("contents") + GT)
此时,您可以使用 entry 来解析您的示例文本(在添加足够多的结束 '> 使其成为有效的嵌套表达式之后):
result = entry.parseString(sample)
result.pprint()
打印:
[[['file'],
[],
[],
[[['cultivation'],
[],
[],
[[['string8'], ['coordinate_system'], ['lonlat'], []],
[['list_vegetation_map_exclusion_zone'],
['vegetation_map_exclusion_zone_list'],
[],
[]],
[['string8'], ['buildings_texture_folder'], [], []],
[['list_plant'], ['plant_list'], [], []],
[['list_building'],
['building_list'],
[],
[[['building'],
['element'],
[0],
[[['vector3_float64'], ['position'], [7.809637, 46.182262, 0], []],
[['float32'], ['direction'], [-1.82264196872711], []],
[['float32'], ['length'], [25.9434452056885], []],
[['float32'], ['width'], [17.4678573608398], []],
[['int32'], ['floors'], [3], []],
[['stringt8c'], ['roof'], ['gable'], []],
[['stringt8c'], ['usage'], ['residential'], []]]]]]]]]]]
所以这是一个开始。我们可以看到值被解析,值被解析为正确的类型。
为了将这些部分转换成更连贯的结构,我们可以将解析操作附加到 entry
,这将是每个 entry
被解析时的解析时回调。
在这种情况下,我们将编写一个解析操作来处理 type/name/value 三元组,然后捕获嵌套内容(如果存在)。我们将尝试从数据类型字符串中推断出如何构造值或内容。
def convert_entry_to_dict(tokens):
# entry is wrapped in a Group, so ungroup to get the parsed elements
parsed = tokens[0]
# unpack data type, optional name and optional value
data_type, name, value = parsed.type_name_value
data_type = data_type[0] if data_type else None
name = name[0] if name else None
# save type and name in dict to be returned from the parse action
ret = {'type': data_type, 'name': name}
# if there were contents present, save them as the value; otherwise,
# get the value from the third element in the triple (use the
# parsed data type as a hint as to whether the value should be a
# scalar, a list, or a str)
if parsed.contents:
ret["value"] = list(parsed.contents)
else:
if data_type.startswith(("vector", "list")):
ret["value"] = [*value]
else:
ret["value"] = value[0] if value else None
if ret["value"] is None and data_type.startswith("string"):
ret["value"] = ""
return ret
entry.addParseAction(convert_entry_to_dict)
现在当我们解析样本时,我们得到这个结构:
[{'name': None,
'type': 'file',
'value': [{'name': None,
'type': 'cultivation',
'value': [{'name': 'coordinate_system',
'type': 'string8',
'value': 'lonlat'},
{'name': 'vegetation_map_exclusion_zone_list',
'type': 'list_vegetation_map_exclusion_zone',
'value': []},
{'name': 'buildings_texture_folder',
'type': 'string8',
'value': ''},
{'name': 'plant_list',
'type': 'list_plant',
'value': []},
{'name': 'building_list',
'type': 'list_building',
'value': [{'name': 'element',
'type': 'building',
'value': [{'name': 'position',
'type': 'vector3_float64',
'value': [7.809637,
46.182262,
0]},
{'name': 'direction',
'type': 'float32',
'value': -1.82264196872711},
{'name': 'length',
'type': 'float32',
'value': 25.9434452056885},
{'name': 'width',
'type': 'float32',
'value': 17.4678573608398},
{'name': 'floors',
'type': 'int32',
'value': 3},
{'name': 'roof',
'type': 'stringt8c',
'value': 'gable'},
{'name': 'usage',
'type': 'stringt8c',
'value': 'residential'}]}]}]}]}]
如果您需要重命名任何字段名称,您可以在解析操作中添加该行为。
这应该会给您一个处理标记的良好开端。