将字符串转换为字典 - pythonic Way
Converting string into dictionary - pythonic Way
专家们,
我写了一个程序把字符串转换成字典。我能够达到预期的结果,但我怀疑这是否是一种 pythonic 方式。想听听同样的建议。
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
我已经使用 (:) 拆分并存储在字典中。
这里的 Cities 和 HeadQuarters 包含另一个字典,我为此编写了这样的代码。
if k == 'cities' :
D[k] = {}
continue
elif k == 'HeadQuarters':
D[k] = {}
continue
elif k == 'LA' :
if D.has_key('cities'):
if D['cities'].get(k) is None:
D['cities'][k] = v
if D.has_key('HeadQuarters'):
if D['HeadQuarters'].get(k) is None:
D['HeadQuarters'][k] = v
elif k == 'NY' :
if D.has_key('cities'):
if D['cities'].get(k) is None:
D['cities'][k] = v
if D.has_key('HeadQuarters'):
if D['HeadQuarters'].get(k) is None:
D['HeadQuarters'][k] = v
else:
D[k]= v
不确定是否是 pythonic
x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
if not x[i][0].startswith(' '):
if x[i][1] != '':
d[x[i][0]] = x[i][1]
else:
t = x[i][0]
tmp = {}
i+=1
while x[i][0].startswith(' '):
tmp[x[i][0].strip()] = x[i][1]
i+=1
d[t] = tmp
print d
输出
{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': ' NY', 'LA': ' LA'}}
您可以在此处使用 split
方法,对您的子词典进行一点递归,并假设您的子词典以制表符 (\t
) 或四个空格开头:
def txt_to_dict(txt):
data = {}
lines = txt.split('\n')
i = 0
while i < len(lines):
try:
key,val = txt.split(':')
except ValueError:
# print "Invalid row format"
i += 1
continue
key = key.strip()
val = val.strip()
if len(val) == 0:
i += 1
sub = ""
while lines[i].startswith('\t') or lines[i].startswith(' '):
sub += lines[i] + '\n'
i += 1
data[key] = txt_to_dict(sub[:-1]) # remove last newline character
else:
data[key] = val
i += 1
return data
然后您只需在您的变量 txt
上将其调用为:
>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}
上面显示的示例输出。正确创建子词典。
添加了一些错误处理。
这会产生与您的代码相同的输出。它主要是通过重构现有内容并应用一些常见的 Python 习语来实现的。
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
D = {} # added to test code
for line in (line for line in txt.splitlines() if line): # "
k, _, v = [s.strip() for s in line.partition(':')] # "
if k in {'cities', 'HeadQuarters'}:
D[k] = {}
continue
elif k in {'LA', 'NY'}:
for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
if k not in D[k2]:
D[k2][k] = v
else:
D[k]= v
import pprint
pprint.pprint(D)
输出:
{'Country': 'USA',
'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
'desgination': 'yyyy',
'name': 'xxxx'}
这个有效
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
di = {}
for line in txt.split('\n'):
if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()
print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}
您可以使用现有的 yaml parser (PyYAML
package):
import yaml # $ pip install pyyaml
data = yaml.safe_load(txt)
结果
{'Country': 'USA',
'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
'desgination': 'yyyy',
'name': 'xxxx'}
解析器按原样接受您的输入,但为了使其更符合 yaml
,它需要 small modifications:
---
Country: USA
HeadQuarters:
LA: LA
NY: NY
cities:
LA: "Los Angeles"
NY: "New York"
desgination: yyyy
name: xxxx
专家们,
我写了一个程序把字符串转换成字典。我能够达到预期的结果,但我怀疑这是否是一种 pythonic 方式。想听听同样的建议。
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
我已经使用 (:) 拆分并存储在字典中。 这里的 Cities 和 HeadQuarters 包含另一个字典,我为此编写了这样的代码。
if k == 'cities' :
D[k] = {}
continue
elif k == 'HeadQuarters':
D[k] = {}
continue
elif k == 'LA' :
if D.has_key('cities'):
if D['cities'].get(k) is None:
D['cities'][k] = v
if D.has_key('HeadQuarters'):
if D['HeadQuarters'].get(k) is None:
D['HeadQuarters'][k] = v
elif k == 'NY' :
if D.has_key('cities'):
if D['cities'].get(k) is None:
D['cities'][k] = v
if D.has_key('HeadQuarters'):
if D['HeadQuarters'].get(k) is None:
D['HeadQuarters'][k] = v
else:
D[k]= v
不确定是否是 pythonic
x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
if not x[i][0].startswith(' '):
if x[i][1] != '':
d[x[i][0]] = x[i][1]
else:
t = x[i][0]
tmp = {}
i+=1
while x[i][0].startswith(' '):
tmp[x[i][0].strip()] = x[i][1]
i+=1
d[t] = tmp
print d
输出
{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': ' NY', 'LA': ' LA'}}
您可以在此处使用 split
方法,对您的子词典进行一点递归,并假设您的子词典以制表符 (\t
) 或四个空格开头:
def txt_to_dict(txt):
data = {}
lines = txt.split('\n')
i = 0
while i < len(lines):
try:
key,val = txt.split(':')
except ValueError:
# print "Invalid row format"
i += 1
continue
key = key.strip()
val = val.strip()
if len(val) == 0:
i += 1
sub = ""
while lines[i].startswith('\t') or lines[i].startswith(' '):
sub += lines[i] + '\n'
i += 1
data[key] = txt_to_dict(sub[:-1]) # remove last newline character
else:
data[key] = val
i += 1
return data
然后您只需在您的变量 txt
上将其调用为:
>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}
上面显示的示例输出。正确创建子词典。
添加了一些错误处理。
这会产生与您的代码相同的输出。它主要是通过重构现有内容并应用一些常见的 Python 习语来实现的。
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
D = {} # added to test code
for line in (line for line in txt.splitlines() if line): # "
k, _, v = [s.strip() for s in line.partition(':')] # "
if k in {'cities', 'HeadQuarters'}:
D[k] = {}
continue
elif k in {'LA', 'NY'}:
for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
if k not in D[k2]:
D[k2][k] = v
else:
D[k]= v
import pprint
pprint.pprint(D)
输出:
{'Country': 'USA',
'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
'desgination': 'yyyy',
'name': 'xxxx'}
这个有效
txt = '''
name : xxxx
desgination : yyyy
cities :
LA : Los Angeles
NY : New York
HeadQuarters :
LA : LA
NY : NY
Country : USA
'''
di = {}
for line in txt.split('\n'):
if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()
print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}
您可以使用现有的 yaml parser (PyYAML
package):
import yaml # $ pip install pyyaml
data = yaml.safe_load(txt)
结果
{'Country': 'USA',
'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
'desgination': 'yyyy',
'name': 'xxxx'}
解析器按原样接受您的输入,但为了使其更符合 yaml
,它需要 small modifications:
---
Country: USA
HeadQuarters:
LA: LA
NY: NY
cities:
LA: "Los Angeles"
NY: "New York"
desgination: yyyy
name: xxxx