如何创建不同列中值的所有值对组合的频率 table

Question

假设我们有：

data = {'Column 1':     [ 1 , 3 , 4 , 1 , 3 , 2 , 3], 
        'Column 2':     [ 3 , 2 , 2 , 3 , 3 , 3 ,''], 
        'Column 3':     [ 3 , 2 , 3 , 1 , 3 , '',''],
        'Column 4':     [ 4 , 2 , 6 , 4 , 2 , '',''],
        'Column 5':     [ 1 , '', '', 4 , 2 , '',''],
        'Column 6':     [ '', '', '', '', 2 , '','']}

df = pd.DataFrame(data=data)

我需要创建一个频率 table 以显示每一行中唯一项目的所有组合。即使它们的顺序不同，它们仍然必须被计算在内。

co1 co2 co3 co4 co5 co6
 1   3   3   4   1
 3   2   2   2
 4   2   3   6
 1   3   1   4   4
 3   3   3   2   2   2
 2   3
 3

结果：

Combination    frequency
[3]            1
[2,3]          3
[1,3,4]        2
[2,3,4,6]      1

如有任何帮助，我们将不胜感激。

Answer 1

想法是删除每行的空字符串，转换为唯一值的集合，排序并转换为元组，对于 Counter 的可能计数，最后将元组转换为列表并创建 DataFrame:

from collections import Counter

L = [tuple(sorted(set([y for y in x if y!= '']))) for x in df.values]

c = Counter(L)
df = pd.DataFrame({'Combination': list(map(list, c.keys())),
                  'frequency':list(c.values())})
print (df)
    Combination  frequency
0     [1, 3, 4]          2
1        [2, 3]          3
2  [2, 3, 4, 6]          1
3           [3]          1

如何创建不同列中值的所有值对组合的频率 table

How to create a frequency table of all value pair combinations of values in different columns

python

list

pandas

columnsorting