把一行拆分成一个字典,里面有多层键值对
split a line into a dictionary with multiple layers of key value pairs
我有一个包含这种格式的行的文件。
Example 1:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
Example 2:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
我首先用“:”分割了这一行,这给了我一个包含 2 个条目的列表。
我想将这一行拆分成一个带有键和值的字典,但是 score 键有多个带有值的子键。
Hole 1
Par 4
Index 2
Distance 459
Score
Player1 4
Player2 6
Player3 4
所以我正在使用类似这样的东西...
split_line_by_semicolon = nextline.split(":")
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=')
for item in split_line_by_semicolon.split(';')))
for keys,values in dictionary_of_line.items():
print("{0} {1}".format(keys,values))
但是我在行的 score
元素上收到错误消息:
ValueError: too many values to unpack (expected 2)
我可以将“=”上的拆分调整为此,因此它会在第一个“=”之后停止
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=',1)
for item in split_line_by_semicolon.split(';')))
for keys,values in dictionary_of_line.items():
print("{0} {1}".format(keys,values))
但是我丢失了大括号内的子值。有人知道我如何实现这个多层字典吗?
一个更简单的方法(但我不知道你的情况是否可以接受)是:
import re
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
# compiles the regular expression to get the info you want
my_regex = re.compile(r'\w+ \= \w+')
# builds the structure of the dict you expect to get
final_dict = {'Hole':0, 'Par':0, 'Index':0, 'Distance':0, 'Score':{}}
# uses the compiled regular expression to filter out the info you want from the string
filtered_items = my_regex.findall(nextline)
for item in filtered_items:
# for each filtered item (string in the form key = value)
# splits out the 'key' and handles it to fill your final dictionary
key = item.split(' = ')[0]
if key.startswith('Player'):
final_dict['Score'][key] = int(item.split(' = ')[1])
else:
final_dict[key] = int(item.split(' = ')[1])
lines = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };", "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
def lines_to_dict(nextline):
import json
# cut up to Hole
nextline = nextline[nextline.index("Hole"):]
# convert to dict format
string_ = re.sub(r'\s+=',':',nextline)
string_ = re.sub(r';',',',string_)
# json likes double quotes
string_ = re.sub(r'(\b\w+)',r'""',string_)
string_ = re.sub(r',$',r'',string_)
# make dict for Hole
mo = re.search(r'(\"Hole.+?),\W+Score.*',string_)
if mo:
d_hole = json.loads("{" + mo.groups()[0] + "}")
# make dict for Score
mo = re.search(r'(\"Score.*)',string_)
if mo:
d_score = json.loads("{" + mo.groups()[0] + "}")
# combine dicts
d_hole.update(d_score)
return d_hole
for d in lines:
pprint.pprint(lines_to_dict(d))
{'Distance': '459',
'Hole': '1',
'Index': '2',
'Par': '4',
'Score': {'Player1': '4'}}
{'Distance': '459',
'Hole': '1',
'Index': '2',
'Par': '4',
'Score': {'Player1': '4', 'Player2': '6', 'Player3': '4'}}
我会以与 maccinza 相同的方式使用正则表达式(我喜欢他的回答),但有一个细微差别 - 可以递归处理其中包含内部字典的数据:
#example strings:
nextline1 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
nextline2 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
import re
lineRegexp = re.compile(r'.+\'WeeklyMedal:(.+)\'?') #this regexp returns WeeklyMedal record.
weeklyMedalRegexp = re.compile(r'(\w+) = (\{.+\}|\w+)') #this regexp parses WeeklyMedal
#helper recursive function to process WeeklyMedal record. returns dictionary
parseWeeklyMedal = lambda r, info: { k: (int(v) if v.isdigit() else parseWeeklyMedal(r, v)) for (k, v) in r.findall(info)}
parsedLines = []
for line in [nextline1, nextline2]:
info = lineRegexp.search(line)
if info:
#process WeeklyMedal record
parsedLines.append(parseWeeklyMedal(weeklyMedalRegexp, info.group(0)))
#or do something with parsed dictionary in place
# do something here with entire result, print for example
print(parsedLines)
我有一个包含这种格式的行的文件。
Example 1:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
Example 2:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
我首先用“:”分割了这一行,这给了我一个包含 2 个条目的列表。 我想将这一行拆分成一个带有键和值的字典,但是 score 键有多个带有值的子键。
Hole 1
Par 4
Index 2
Distance 459
Score
Player1 4
Player2 6
Player3 4
所以我正在使用类似这样的东西...
split_line_by_semicolon = nextline.split(":")
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=')
for item in split_line_by_semicolon.split(';')))
for keys,values in dictionary_of_line.items():
print("{0} {1}".format(keys,values))
但是我在行的 score
元素上收到错误消息:
ValueError: too many values to unpack (expected 2)
我可以将“=”上的拆分调整为此,因此它会在第一个“=”之后停止
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=',1)
for item in split_line_by_semicolon.split(';')))
for keys,values in dictionary_of_line.items():
print("{0} {1}".format(keys,values))
但是我丢失了大括号内的子值。有人知道我如何实现这个多层字典吗?
一个更简单的方法(但我不知道你的情况是否可以接受)是:
import re
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
# compiles the regular expression to get the info you want
my_regex = re.compile(r'\w+ \= \w+')
# builds the structure of the dict you expect to get
final_dict = {'Hole':0, 'Par':0, 'Index':0, 'Distance':0, 'Score':{}}
# uses the compiled regular expression to filter out the info you want from the string
filtered_items = my_regex.findall(nextline)
for item in filtered_items:
# for each filtered item (string in the form key = value)
# splits out the 'key' and handles it to fill your final dictionary
key = item.split(' = ')[0]
if key.startswith('Player'):
final_dict['Score'][key] = int(item.split(' = ')[1])
else:
final_dict[key] = int(item.split(' = ')[1])
lines = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };", "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
def lines_to_dict(nextline):
import json
# cut up to Hole
nextline = nextline[nextline.index("Hole"):]
# convert to dict format
string_ = re.sub(r'\s+=',':',nextline)
string_ = re.sub(r';',',',string_)
# json likes double quotes
string_ = re.sub(r'(\b\w+)',r'""',string_)
string_ = re.sub(r',$',r'',string_)
# make dict for Hole
mo = re.search(r'(\"Hole.+?),\W+Score.*',string_)
if mo:
d_hole = json.loads("{" + mo.groups()[0] + "}")
# make dict for Score
mo = re.search(r'(\"Score.*)',string_)
if mo:
d_score = json.loads("{" + mo.groups()[0] + "}")
# combine dicts
d_hole.update(d_score)
return d_hole
for d in lines:
pprint.pprint(lines_to_dict(d))
{'Distance': '459',
'Hole': '1',
'Index': '2',
'Par': '4',
'Score': {'Player1': '4'}}
{'Distance': '459',
'Hole': '1',
'Index': '2',
'Par': '4',
'Score': {'Player1': '4', 'Player2': '6', 'Player3': '4'}}
我会以与 maccinza 相同的方式使用正则表达式(我喜欢他的回答),但有一个细微差别 - 可以递归处理其中包含内部字典的数据:
#example strings:
nextline1 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
nextline2 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
import re
lineRegexp = re.compile(r'.+\'WeeklyMedal:(.+)\'?') #this regexp returns WeeklyMedal record.
weeklyMedalRegexp = re.compile(r'(\w+) = (\{.+\}|\w+)') #this regexp parses WeeklyMedal
#helper recursive function to process WeeklyMedal record. returns dictionary
parseWeeklyMedal = lambda r, info: { k: (int(v) if v.isdigit() else parseWeeklyMedal(r, v)) for (k, v) in r.findall(info)}
parsedLines = []
for line in [nextline1, nextline2]:
info = lineRegexp.search(line)
if info:
#process WeeklyMedal record
parsedLines.append(parseWeeklyMedal(weeklyMedalRegexp, info.group(0)))
#or do something with parsed dictionary in place
# do something here with entire result, print for example
print(parsedLines)