处理列表中的冗余
Handling for redundancy in a list
假设我有一个包含州和县的元组列表:
stList = [('NJ', 'Burlington County'),
('NJ', 'Middlesex County'),
('VA', 'Frederick County'),
('MD', 'Montgomery County'),
('NC', 'Lee County'),
('NC', 'Alamance County')]
对于其中的每一项,我想将州与县压缩在一起,如下所示:
new_list = [{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]
我试过类似的方法,但它不能正常工作(它遍历每个 'letter' 并单独压缩它们):
new_list = []
for item in stList:
d1 = dict(zip(item[0], item[1]))
new_list.append(d1)
Returns:
[{'N': 'B', 'J': 'u'},
{'N': 'M', 'J': 'i'},
{'V': 'F', 'A': 'r'},
{'M': 'M', 'D': 'o'},
{'N': 'L', 'C': 'e'},
{'N': 'A', 'C': 'l'}]
为了让事情变得更复杂,我的最终目标实际上是为每个州(键)创建一个字典列表,其中以县(值)作为列表。如何修复压缩字典,然后将县作为每个州的列表?
final_list = [{'NJ': ['Burlington County', 'Middlesex County']},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': ['Lee County', 'Alamance County'}]
你会得到错误的结果,因为 zip
将字符串视为可迭代对象。这是预期的行为。
你可能会得到接近你想要的东西:
result = dict()
for state, county in stList:
result.setdefault(state, list()).append(county)
print(result)
结果是一个包含列表的字典。输出:
{'NJ': ['Burlington County', 'Middlesex County'], 'VA': ['Frederick County'], 'MD': ['Montgomery County'], 'NC': ['Lee County', 'Alamance County']}
您的代码被破坏的原因可能是由于对 zip
的误解。它基本上将每个名称视为一个单独的迭代器并迭代前两个字符 s[:1]
。如果你想要每个州的州和县之间的映射,你可以试试这个:
mapping = {}
for state, cty in stList:
if (state in mapping):
mapping[state].append(cty)
else:
mapping[state] = [cty]
无论如何,这是最简单的方法。但是,如果您想使用 itertools,您可以像这样 groupby
:
mapping = dict( [ (k, [gg[1] for gg in g]) for k, g in groupby(stList, key = lambda x: x[0]) ] )
我认为 zip() 不适合这个。这里有两个可能的解决方案。
如果您必须使用列表来存储结果,您将不得不在此答案之后更进一步。但是,如果对结果使用 dict 可行,那么这个答案可能会让您到达那里:
stList = [('NJ', 'Burlington County'),
('NJ', 'Middlesex County'),
('VA', 'Frederick County'),
('MD', 'Montgomery County'),
('NC', 'Lee County'),
('NC', 'Alamance County')]
new_list = []
for item in stList:
new_list.append({item[0]:item[1]})
print "new list: ", new_list
new_dict = {}
for item in stList:
if item[0] in new_dict:
new_dict[item[0]].append(item[1])
else:
new_dict[item[0]] = [item[1]]
print "new dict: ", new_dict
这些解决方案产生以下结果:
新列表:[{'NJ':'Burlington County'},{'NJ':'Middlesex County'},{'VA':'Frederick County'} , {'MD': 'Montgomery County'}, {'NC': 'Lee County'}, {'NC': 'Alamance County'}]
新字典:{'VA': ['Frederick County'], 'NJ': ['Burlington County', 'Middlesex County'], 'NC': [ 'Lee County'、'Alamance County']、'MD':['Montgomery County']}
列表理解似乎是这里最简单的方法
[{i[0]:i[1]} for i in stList]
输出
[{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]
Poolka 的 setdefault
解决方案合理、高效且可读,但可以通过 defaultdict
变得更加直观:
from collections import defaultdict
result = defaultdict(list)
for state, county in stList:
result[state].append(county)
如果您的列表中有带日期的三胞胎,您可以做一个嵌套版本:
result = defaultdict(lambda: defaultdict(list))
for state, county, date in stList:
result[state][county].append(date)
对于没有上述任何属性的单行,您可以使用itertools.groupby
;)
from itertools import groupby
{k: [x[1] for x in g] for k, g in groupby(sorted(stList), key=lambda x: x[0])}
# {'NC': ['Alamance County', 'Lee County'],
# 'MD': ['Montgomery County'],
# 'NJ': ['Burlington County', 'Middlesex County'],
# 'VA': ['Frederick County']}
从算法上讲,这更糟,因为它必须对初始 list
.
进行排序
假设我有一个包含州和县的元组列表:
stList = [('NJ', 'Burlington County'),
('NJ', 'Middlesex County'),
('VA', 'Frederick County'),
('MD', 'Montgomery County'),
('NC', 'Lee County'),
('NC', 'Alamance County')]
对于其中的每一项,我想将州与县压缩在一起,如下所示:
new_list = [{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]
我试过类似的方法,但它不能正常工作(它遍历每个 'letter' 并单独压缩它们):
new_list = []
for item in stList:
d1 = dict(zip(item[0], item[1]))
new_list.append(d1)
Returns:
[{'N': 'B', 'J': 'u'},
{'N': 'M', 'J': 'i'},
{'V': 'F', 'A': 'r'},
{'M': 'M', 'D': 'o'},
{'N': 'L', 'C': 'e'},
{'N': 'A', 'C': 'l'}]
为了让事情变得更复杂,我的最终目标实际上是为每个州(键)创建一个字典列表,其中以县(值)作为列表。如何修复压缩字典,然后将县作为每个州的列表?
final_list = [{'NJ': ['Burlington County', 'Middlesex County']},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': ['Lee County', 'Alamance County'}]
你会得到错误的结果,因为 zip
将字符串视为可迭代对象。这是预期的行为。
你可能会得到接近你想要的东西:
result = dict()
for state, county in stList:
result.setdefault(state, list()).append(county)
print(result)
结果是一个包含列表的字典。输出:
{'NJ': ['Burlington County', 'Middlesex County'], 'VA': ['Frederick County'], 'MD': ['Montgomery County'], 'NC': ['Lee County', 'Alamance County']}
您的代码被破坏的原因可能是由于对 zip
的误解。它基本上将每个名称视为一个单独的迭代器并迭代前两个字符 s[:1]
。如果你想要每个州的州和县之间的映射,你可以试试这个:
mapping = {}
for state, cty in stList:
if (state in mapping):
mapping[state].append(cty)
else:
mapping[state] = [cty]
无论如何,这是最简单的方法。但是,如果您想使用 itertools,您可以像这样 groupby
:
mapping = dict( [ (k, [gg[1] for gg in g]) for k, g in groupby(stList, key = lambda x: x[0]) ] )
我认为 zip() 不适合这个。这里有两个可能的解决方案。 如果您必须使用列表来存储结果,您将不得不在此答案之后更进一步。但是,如果对结果使用 dict 可行,那么这个答案可能会让您到达那里:
stList = [('NJ', 'Burlington County'),
('NJ', 'Middlesex County'),
('VA', 'Frederick County'),
('MD', 'Montgomery County'),
('NC', 'Lee County'),
('NC', 'Alamance County')]
new_list = []
for item in stList:
new_list.append({item[0]:item[1]})
print "new list: ", new_list
new_dict = {}
for item in stList:
if item[0] in new_dict:
new_dict[item[0]].append(item[1])
else:
new_dict[item[0]] = [item[1]]
print "new dict: ", new_dict
这些解决方案产生以下结果:
新列表:[{'NJ':'Burlington County'},{'NJ':'Middlesex County'},{'VA':'Frederick County'} , {'MD': 'Montgomery County'}, {'NC': 'Lee County'}, {'NC': 'Alamance County'}]
新字典:{'VA': ['Frederick County'], 'NJ': ['Burlington County', 'Middlesex County'], 'NC': [ 'Lee County'、'Alamance County']、'MD':['Montgomery County']}
列表理解似乎是这里最简单的方法
[{i[0]:i[1]} for i in stList]
输出
[{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]
Poolka 的 setdefault
解决方案合理、高效且可读,但可以通过 defaultdict
变得更加直观:
from collections import defaultdict
result = defaultdict(list)
for state, county in stList:
result[state].append(county)
如果您的列表中有带日期的三胞胎,您可以做一个嵌套版本:
result = defaultdict(lambda: defaultdict(list))
for state, county, date in stList:
result[state][county].append(date)
对于没有上述任何属性的单行,您可以使用itertools.groupby
;)
from itertools import groupby
{k: [x[1] for x in g] for k, g in groupby(sorted(stList), key=lambda x: x[0])}
# {'NC': ['Alamance County', 'Lee County'],
# 'MD': ['Montgomery County'],
# 'NJ': ['Burlington County', 'Middlesex County'],
# 'VA': ['Frederick County']}
从算法上讲,这更糟,因为它必须对初始 list
.