Python itertools - 结合 groupby 和食谱工具的石斑鱼
Python itertools - Combining groupby and recipe tools' grouper
假设我有以下数据:
data = [['John', 1], ['Ada', 2], ['Ada', 3], ['Paul', 4],
['Paul', 5], ['Paul', 6], ['Kat', 7], ['Kat', 8]]
我可以用 groupby
:
按人对条目进行分组
In [37]:
from itertools import groupby, izip_longest
from operator import itemgetter
for name, g in groupby(data, key=itemgetter(0)):
print name, list(g)
John [['John', 1]]
Ada [['Ada', 2], ['Ada', 3]]
Paul [['Paul', 4], ['Paul', 5], ['Paul', 6]]
Kat [['Kat', 7], ['Kat', 8]]
我还可以使用 recipe tools' grouper 每两个条目分组。我会 copy/paste 供参考:
In [38]:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
for g in grouper(data, 2):
print g
(['John', 1], ['Ada', 2])
(['Ada', 3], ['Paul', 4])
(['Paul', 5], ['Paul', 6])
(['Kat', 7], ['Kat', 8])
但现在,我想遍历数据,使第一个元素包含 John 和 Ada 的数据,第二个元素包含 Paul 和 Kat 的数据。换句话说,我想像这样组合 groupby
和 grouper
:
In [39]:
person_iterator = groupby(data, key=itemgetter(0))
for group_iterator in grouper(person_iterator, 2):
print [(keyvalue[0], list(keyvalue[1])) for keyvalue in group_iterator]
但是输出不是我所期望的:
[('John', []), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', []), ('Kat', [['Kat', 7], ['Kat', 8]])]
为什么 John 和 Paul 的列表是空的?如何解决?
iterator
由 itertools.groupby
产生的 iterator
在下一个 iterator
产生时耗尽。
您需要在将迭代器传递给 grouper
之前将迭代器转换为序列,以防止:
person_iterator = ((key, list(grp)) for key, grp in groupby(data, key=itemgetter(0)))
for group_iterator in grouper(person_iterator, 2):
print [(key, value) for key, value in group_iterator]
输出:
[('John', [['John', 1]]), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', [['Paul', 4], ['Paul', 5], ['Paul', 6]]), ('Kat', [['Kat', 7], ['Kat', 8]])]
假设我有以下数据:
data = [['John', 1], ['Ada', 2], ['Ada', 3], ['Paul', 4],
['Paul', 5], ['Paul', 6], ['Kat', 7], ['Kat', 8]]
我可以用 groupby
:
In [37]:
from itertools import groupby, izip_longest
from operator import itemgetter
for name, g in groupby(data, key=itemgetter(0)):
print name, list(g)
John [['John', 1]]
Ada [['Ada', 2], ['Ada', 3]]
Paul [['Paul', 4], ['Paul', 5], ['Paul', 6]]
Kat [['Kat', 7], ['Kat', 8]]
我还可以使用 recipe tools' grouper 每两个条目分组。我会 copy/paste 供参考:
In [38]:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
for g in grouper(data, 2):
print g
(['John', 1], ['Ada', 2])
(['Ada', 3], ['Paul', 4])
(['Paul', 5], ['Paul', 6])
(['Kat', 7], ['Kat', 8])
但现在,我想遍历数据,使第一个元素包含 John 和 Ada 的数据,第二个元素包含 Paul 和 Kat 的数据。换句话说,我想像这样组合 groupby
和 grouper
:
In [39]:
person_iterator = groupby(data, key=itemgetter(0))
for group_iterator in grouper(person_iterator, 2):
print [(keyvalue[0], list(keyvalue[1])) for keyvalue in group_iterator]
但是输出不是我所期望的:
[('John', []), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', []), ('Kat', [['Kat', 7], ['Kat', 8]])]
为什么 John 和 Paul 的列表是空的?如何解决?
iterator
由 itertools.groupby
产生的 iterator
在下一个 iterator
产生时耗尽。
您需要在将迭代器传递给 grouper
之前将迭代器转换为序列,以防止:
person_iterator = ((key, list(grp)) for key, grp in groupby(data, key=itemgetter(0)))
for group_iterator in grouper(person_iterator, 2):
print [(key, value) for key, value in group_iterator]
输出:
[('John', [['John', 1]]), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', [['Paul', 4], ['Paul', 5], ['Paul', 6]]), ('Kat', [['Kat', 7], ['Kat', 8]])]