查找元组列表中具有相同第一项和第三项的元组的计数

Finding count of tuples with same first and third item in list of tuples

我有一个元组列表,每个元组包含三个项目:

z = [(1, 4, 2015), (1, 11, 2015), (1, 18, 2015), (1, 25, 2015), (2, 1, 2015), (2, 8, 2015), (2, 15, 2015), (2, 22, 2015), (3, 1, 2015), (3, 8, 2015), (3, 15, 2015), (3, 22, 2015), (3, 29, 2015), (4, 5, 2015), (4, 12, 2015), (4, 19, 2015), (4, 26, 2015), (5, 3, 2015), (5, 10, 2015), (5, 17, 2015), (5, 24, 2015), (5, 31, 2015), (6, 7, 2015), (6, 14, 2015), (6, 21, 2015), (6, 28, 2015), (7, 5, 2015), (7, 12, 2015), (7, 19, 2015), (7, 26, 2015), (8, 2, 2015), (8, 9, 2015), (8, 16, 2015), (8, 23, 2015), (8, 30, 2015), (9, 6, 2015), (9, 13, 2015), (9, 20, 2015), (9, 27, 2015), (10, 4, 2015), (10, 11, 2015), (10, 18, 2015), (10, 25, 2015), (11, 1, 2015), (11, 8, 2015), (11, 15, 2015), (11, 22, 2015), (11, 29, 2015), (12, 6, 2015), (12, 13, 2015), (12, 20, 2015), (12, 27, 2015), (1, 3, 2016), (1, 10, 2016), (1, 17, 2016), (1, 24, 2016), (1, 31, 2016)]

我想在列表中查找第一项和第三项相同的元组的数量,例如第一项 1 和第三项 2015,有 4 个元组;第一项 2 和第三项 2015,有 4 个元组。

我试过了:

for tup in z:
    a=tup[0]
    b=tup[2]
    print(len(set({a:b})))

它没有给出预期的结果。怎么做?

在纯 python 中使用 Counter 和生成器,谢谢@Felix:

from collections import Counter

out = Counter((x[0], x[2]) for x in z)
print (out)
Counter({(3, 2015): 5, 
         (5, 2015): 5, 
         (8, 2015): 5,
         (11, 2015): 5, 
         (1, 2016): 5, 
         (1, 2015): 4, 
         (2, 2015): 4, 
         (4, 2015): 4, 
         (6, 2015): 4, 
         (7, 2015): 4, 
         (9, 2015): 4, 
         (10, 2015): 4,
         (12, 2015): 4})

在 pandas 中按 GroupBy.size 聚合计数,输出为 Series:

s = pd.DataFrame(z).groupby([0,2]).size()
print (s)
0   2   
1   2015    4
    2016    5
2   2015    4
3   2015    5
4   2015    4
5   2015    5
6   2015    4
7   2015    4
8   2015    5
9   2015    4
10  2015    4
11  2015    5
12  2015    4
dtype: int64

使用collections.

例如:

import collections
d = collections.defaultdict(int)
z = [(1, 4, 2015), (1, 11, 2015), (1, 18, 2015), (1, 25, 2015), (2, 1, 2015), (2, 8, 2015), (2, 15, 2015), (2, 22, 2015), (3, 1, 2015), (3, 8, 2015), (3, 15, 2015), (3, 22, 2015), (3, 29, 2015), (4, 5, 2015), (4, 12, 2015), (4, 19, 2015), (4, 26, 2015), (5, 3, 2015), (5, 10, 2015), (5, 17, 2015), (5, 24, 2015), (5, 31, 2015), (6, 7, 2015), (6, 14, 2015), (6, 21, 2015), (6, 28, 2015), (7, 5, 2015), (7, 12, 2015), (7, 19, 2015), (7, 26, 2015), (8, 2, 2015), (8, 9, 2015), (8, 16, 2015), (8, 23, 2015), (8, 30, 2015), (9, 6, 2015), (9, 13, 2015), (9, 20, 2015), (9, 27, 2015), (10, 4, 2015), (10, 11, 2015), (10, 18, 2015), (10, 25, 2015), (11, 1, 2015), (11, 8, 2015), (11, 15, 2015), (11, 22, 2015), (11, 29, 2015), (12, 6, 2015), (12, 13, 2015), (12, 20, 2015), (12, 27, 2015), (1, 3, 2016), (1, 10, 2016), (1, 17, 2016), (1, 24, 2016), (1, 31, 2016)]
for i in z:
    d[(i[0], i[2])] += 1
print(d)

输出:

defaultdict(<type 'int'>, {(10, 2015): 4, (5, 2015): 5, (2, 2015): 4, (11, 2015): 5, (6, 2015): 4, (8, 2015): 5, (3, 2015): 5, (12, 2015): 4, (7, 2015): 4, (9, 2015): 4, (4, 2015): 4, (1, 2016): 5, (1, 2015): 4})

使用标准 python 的 itertools.groupby:

from itertools import groupby

for grp, elmts in groupby(z, lambda x: (x[0], x[2])):
    print(grp, len(list(elmts)))

编辑:

一个更好的解决方案,使用 operator.itemgetter 而不是 lambda

from operator import itemgetter
from itertools import groupby

for grp, elmts in groupby(z, itemgetter(0, 2)):
    print(grp, len(list(elmts)))

输出:

(1, 2015) 4
(2, 2015) 4
(3, 2015) 5
(4, 2015) 4
(5, 2015) 5
(6, 2015) 4
(7, 2015) 4
(8, 2015) 5
(9, 2015) 4
(10, 2015) 4
(11, 2015) 5
(12, 2015) 4
(1, 2016) 5

collections.Counteroperator.itemgetter 一起使用:

from collections import Counter
from operator import itemgetter

res = Counter(map(itemgetter(0, 2), z))

print(res)

Counter({(1, 2015): 4,
         (1, 2016): 5,
         (2, 2015): 4,
         (3, 2015): 5,
         (4, 2015): 4,
         (5, 2015): 5,
         (6, 2015): 4,
         (7, 2015): 4,
         (8, 2015): 5,
         (9, 2015): 4,
         (10, 2015): 4,
         (11, 2015): 5,
         (12, 2015): 4})

您可以将计数存储在一个字典中,由一个元组作为键,该元组由原始元组列表中的第一项和第三项组成,例如:

import collections

z = [(1, 4, 2015), (1, 11, 2015), (1, 18, 2015), (1, 25, 2015), (2, 1, 2015), (2, 8, 2015),
     (2, 15, 2015), (2, 22, 2015), (3, 1, 2015), (3, 8, 2015), (3, 15, 2015), (3, 22, 2015),
     (3, 29, 2015), (4, 5, 2015), (4, 12, 2015), (4, 19, 2015), (4, 26, 2015), (5, 3, 2015),
     (5, 10, 2015), (5, 17, 2015), (5, 24, 2015), (5, 31, 2015), (6, 7, 2015), (6, 14, 2015),
     (6, 21, 2015), (6, 28, 2015), (7, 5, 2015), (7, 12, 2015), (7, 19, 2015), (7, 26, 2015),
     (8, 2, 2015), (8, 9, 2015), (8, 16, 2015), (8, 23, 2015), (8, 30, 2015), (9, 6, 2015),
     (9, 13, 2015), (9, 20, 2015), (9, 27, 2015), (10, 4, 2015), (10, 11, 2015),
     (10, 18, 2015), (10, 25, 2015), (11, 1, 2015), (11, 8, 2015), (11, 15, 2015),
     (11, 22, 2015), (11, 29, 2015), (12, 6, 2015), (12, 13, 2015), (12, 20, 2015),
     (12, 27, 2015), (1, 3, 2016), (1, 10, 2016), (1, 17, 2016), (1, 24, 2016), (1, 31, 2016)]

counter = collections.defaultdict(int)  # Use a dict factory to save some time
for element in z:  # iterate over the tuples
    counter[(element[0], element[2])] += 1  # increase the count for each match

# finally, lets print the results
for k, count in counter.items():
    print("{}: {}".format(k, count))

哪个会给你:

(1, 2015): 4
(2, 2015): 4
(3, 2015): 5
(4, 2015): 4
(5, 2015): 5
(6, 2015): 4
(7, 2015): 4
(8, 2015): 5
(9, 2015): 4
(10, 2015): 4
(11, 2015): 5
(12, 2015): 4
(1, 2016): 5

from collections import Counter tmp = [(x[0],x[2]) for x in z] print(Counter(tmp))

输出会像 Counter({(5, 2015): 5, (11, 2015): 5, (8, 2015): 5, (3, 2015): 5, (1, 2016): 5, (10, 2015): 4, (2, 2015): 4, (6, 2015): 4, (12, 2015): 4, (7, 2015): 4, (9, 2015): 4, (4, 2015): 4, (1, 2015): 4})

试试这个:

z = [(1, 4, 2015), (1, 11, 2015), (1, 18, 2015), (1, 25, 2015), (2, 1, 2015), (2, 8, 2015), (2, 15, 2015), (2, 22, 2015), (3, 1, 2015), (3, 8, 2015), (3, 15, 2015), (3, 22, 2015), (3, 29, 2015), (4, 5, 2015), (4, 12, 2015), (4, 19, 2015), (4, 26, 2015), (5, 3, 2015), (5, 10, 2015), (5, 17, 2015), (5, 24, 2015), (5, 31, 2015), (6, 7, 2015), (6, 14, 2015), (6, 21, 2015), (6, 28, 2015), (7, 5, 2015), (7, 12, 2015), (7, 19, 2015), (7, 26, 2015), (8, 2, 2015), (8, 9, 2015), (8, 16, 2015), (8, 23, 2015), (8, 30, 2015), (9, 6, 2015), (9, 13, 2015), (9, 20, 2015), (9, 27, 2015), (10, 4, 2015), (10, 11, 2015), (10, 18, 2015), (10, 25, 2015), (11, 1, 2015), (11, 8, 2015), (11, 15, 2015), (11, 22, 2015), (11, 29, 2015), (12, 6, 2015), (12, 13, 2015), (12, 20, 2015), (12, 27, 2015), (1, 3, 2016), (1, 10, 2016), (1, 17, 2016), (1, 24, 2016), (1, 31, 2016)]
newz = [(i[0],i[-1]) for i in z]
for i in list(set(newz)):
   print(str(i)+' '+str(newz.count(i)))

输出:

(10, 2015) 4
(5, 2015) 5
(2, 2015) 4
(11, 2015) 5
(6, 2015) 4
(8, 2015) 5
(3, 2015) 5
(12, 2015) 4
(7, 2015) 4
(9, 2015) 4
(1, 2016) 5
(4, 2015) 4
(1, 2015) 4

groupby以外的解决方法,

import pprint
import random

from collections import Counter

z = [] # creating random dates as user has 2 years, won't work if year range increases

num_dates = 20
counts_by_month_and_year = Counter()

while len(z) < num_dates:
    new = (random.randrange(1, 31), random.randrange(1, 12), random.randrange(2015, 2016))

    z.append(new)
    counts_by_month_and_year[(new[0], new[2])] += 1


pprint.pprint(dict(counts_by_month_and_year)) # formatting the output 
{(1, 2015): 1,
 (3, 2015): 1,
 (4, 2015): 1,
 (5, 2015): 1,
 (7, 2015): 1,
 (8, 2015): 2,
 (9, 2015): 1,
 (11, 2015): 1,
 (13, 2015): 1,
 (16, 2015): 1,
 (17, 2015): 1,
 (20, 2015): 1,
 (21, 2015): 2,
 (22, 2015): 1,
 (25, 2015): 1,
 (26, 2015): 1,
 (27, 2015): 2}

[Program finished]