Table 最常见的组合是 pandas python

Question

数据集

ID Product
1   A
1   B
2   A 
3   A
3   C 
3   D 
4   A
4   B
5   A
5   C
5   D
.....

目标是根据 ID 对产品进行最频繁的组合，而不管字符串值的数量。

这里的预期结果是：

[A, C, D]  2
[A, B]     2
[A, C]     2
......

类似但有工作价值

import itertools

(df.groupby('ID').Product.agg(lambda x: list(set(itertools.combinations(x,**?**))))
                 .explode().str.join('-').value_counts())

Answer 1

IIUC，groupby ID, aggregate to frozenset and count the occurrences with value_counts：

df.groupby('ID')['Product'].agg(frozenset).value_counts()

输出：

(B, A)       2
(D, C, A)    2
(A)          1
Name: Product, dtype: int64

使用排序元组的替代方法：

df.groupby('ID')['Product'].agg(lambda x: tuple(sorted(x))).value_counts()

输出：

(A, B)       2
(A, C, D)    2
(A,)         1
Name: Product, dtype: int64

或字符串：

df.groupby('ID')['Product'].agg(lambda x: ','.join(sorted(x))).value_counts()

输出：

A,B      2
A,C,D    2
A        1
Name: Product, dtype: int64

Table 最常见的组合是 pandas python

Table with most frequent combinations with pandas python

python

pandas

data-science