pandas：来自 groupby.value_counts() 的字典

Question

我有一个 pandas 数据框 df，其中包含列 user 和 product。它描述了哪个用户购买了哪些产品，并考虑了对同一产品的重复购买。例如。如果用户 1 购买产品 23 三次，df 将包含用户 1 三次条目 23。对于每个用户，我只对该用户购买了 3 次以上的产品感兴趣。因此，我执行 s = df.groupby('user').product.value_counts()，然后过滤 s = s[s>2]，以丢弃购买频率不够高的产品。然后，s 看起来像这样：

user     product
3        39190         9
         47766         8
         21903         8
6        21903         5
         38293         5
11       8309          7
         27959         7
         14947         5
         35948         4
         8670          4

过滤数据后，我对频率（右列）不再感兴趣。

如何根据 s 创建 user:product 形式的字典？我无法访问系列中的个人 columns/index。

Answer 1

选项 0

s.reset_index().groupby('user').product.apply(list).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

选项 1

s.groupby(level='user').apply(lambda x: x.loc[x.name].index.tolist()).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

选项 2

from collections import defaultdict

d = defaultdict(list)

[d[x].append(y) for x, y in s.index.values];

dict(d)

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

pandas：来自 groupby.value_counts() 的字典

pandas: Dict from groupby.value_counts()

python

pandas

pandas-groupby