将字符串转换为字典 - pythonic Way

Converting string into dictionary - pythonic Way

专家们,

我写了一个程序把字符串转换成字典。我能够达到预期的结果,但我怀疑这是否是一种 pythonic 方式。想听听同样的建议。

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''

我已经使用 (:) 拆分并存储在字典中。 这里的 Cities 和 HeadQuarters 包含另一个字典,我为此编写了这样的代码。

if k == 'cities' : 
    D[k] = {}
    continue
elif k == 'HeadQuarters':
    D[k] = {}
    continue
elif k ==  'LA' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
elif k ==  'NY' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
else: 
    D[k]= v 

不确定是否是 pythonic

x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
    if not x[i][0].startswith('    '):
        if x[i][1] != '':
            d[x[i][0]] = x[i][1]
        else:
            t = x[i][0]
            tmp = {}
            i+=1
            while x[i][0].startswith('    '):
                tmp[x[i][0].strip()] = x[i][1]
                i+=1
            d[t] = tmp
print d

输出

{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': '  NY', 'LA': '  LA'}}

您可以在此处使用 split 方法,对您的子词典进行一点递归,并假设您的子词典以制表符 (\t) 或四个空格开头:

def txt_to_dict(txt):
    data = {}
    lines = txt.split('\n')
    i = 0
    while i < len(lines):
        try:
            key,val = txt.split(':')
        except ValueError:
            # print "Invalid row format"
            i += 1
            continue
        key = key.strip()
        val = val.strip()
        if len(val) == 0:
            i += 1
            sub = ""
            while lines[i].startswith('\t') or lines[i].startswith('    '):
                  sub += lines[i] + '\n'
                  i += 1
            data[key] = txt_to_dict(sub[:-1])  # remove last newline character
        else:
            data[key] = val
            i += 1
    return data

然后您只需在您的变量 txt 上将其调用为:

>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}

上面显示的示例输出。正确创建子词典。

添加了一些错误处理。

这会产生与您的代码相同的输出。它主要是通过重构现有内容并应用一些常见的 Python 习语来实现的。

txt = '''
name         : xxxx
desgination  : yyyy
cities       :
    LA       : Los Angeles
    NY       : New York
HeadQuarters :
    LA       :  LA
    NY       :  NY
Country      : USA
'''

D = {}                                                    # added to test code
for line in (line for line in txt.splitlines() if line):  #        "
    k, _, v = [s.strip() for s in line.partition(':')]    #        "

    if k in {'cities', 'HeadQuarters'}:
        D[k] = {}
        continue
    elif k in {'LA', 'NY'}:
        for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
            if k not in D[k2]:
                D[k2][k] = v
    else:
        D[k]= v

import pprint
pprint.pprint(D)

输出:

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

这个有效

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''
di = {}
for line in txt.split('\n'):
   if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()

print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}

您可以使用现有的 yaml parser (PyYAML package):

import yaml # $ pip install pyyaml

data = yaml.safe_load(txt)

结果

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

解析器按原样接受您的输入,但为了使其更符合 yaml,它需要 small modifications:

--- 
Country: USA
HeadQuarters: 
  LA: LA
  NY: NY
cities: 
  LA: "Los Angeles"
  NY: "New York"
desgination: yyyy
name: xxxx