在所有列中创建最高值列表 pandas

Question

我正在尝试获取 pandas 数据框中所有列中前 2 个值计数的列表。 DF是这样的

            column1          column2          column3
 1           apple            red               cat
 2          banana            blue              dog
 3          grapes            yellow            cat
 4           apple            blue              cat
 5          banana            red               tiger
 6          banana            blue              dog

我希望结果以列表的形式出现。像这样：

 ['banana', 'apple', 'blue', 'red', 'cat', 'dog']

有人可以帮我解决这个问题吗？

Answer 1

对所有列使用 Series.value_counts 并使用切片按索引过滤最高值（因为 value_counts 对值进行排序），然后将值转换为列表：

a = df.apply(lambda x: x.value_counts()[:2].index.tolist()).to_numpy().ravel('F').tolist()
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']

具有展平值的列表理解解决方案：

a = [x for c in df.columns for x in df[c].value_counts()[:2].index]
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']

Answer 2

您可以使用简单的列表理解调用 value_counts combined with itertools.chain:

from itertools import chain

out = list(chain.from_iterable(df[c].value_counts()[:2].index for c in df))

输出：['banana', 'apple', 'blue', 'red', 'cat', 'dog']

在所有列中创建最高值列表 pandas

creating list of top values in all columns pandas

list

selection

pandas