pandas 列中列表中元素的值计数
pandas value count of elements in list in column
我有一列包含大小不一但项目数量有限的列表。
print(df['channels'].value_counts(), '\n')
输出:
[web, email, mobile, social] 77733
[web, email, mobile] 43730
[email, mobile, social] 32367
[web, email] 13751
所以我想要网络、电子邮件、移动和社交的总次数。
这些应该是:
web = 77733 + 43730 + 13751 135,214
email = 77733 + 43730 + 13751 + 32367 167,581
mobile = 77733 + 43730 + 32367 153,830
social = 77733 + 32367 110,100
我试过以下两种方法:
sum_channels_items = pd.Series([x for item in df['channels'] for x in item]).value_counts()
print(sum_channels_items)
from itertools import chain
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
print(test)
两者都因相同的错误而失败(仅显示第二个)。
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 416, in <module>
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
TypeError: 'float' object is not iterable
一个选项是explode
,然后计数值:
out = df['channels'].explode().value_counts()
另一个可能是使用 collections.Counter
。请注意,您的错误表明您在该列中缺少值,因此您可以先删除它们:
from itertools import chain
from collections import Counter
out = pd.Series(Counter(chain.from_iterable(df['channels'].dropna())))
我有一列包含大小不一但项目数量有限的列表。
print(df['channels'].value_counts(), '\n')
输出:
[web, email, mobile, social] 77733
[web, email, mobile] 43730
[email, mobile, social] 32367
[web, email] 13751
所以我想要网络、电子邮件、移动和社交的总次数。
这些应该是:
web = 77733 + 43730 + 13751 135,214
email = 77733 + 43730 + 13751 + 32367 167,581
mobile = 77733 + 43730 + 32367 153,830
social = 77733 + 32367 110,100
我试过以下两种方法:
sum_channels_items = pd.Series([x for item in df['channels'] for x in item]).value_counts()
print(sum_channels_items)
from itertools import chain
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
print(test)
两者都因相同的错误而失败(仅显示第二个)。
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 416, in <module>
test = pd.Series(list(chain.from_iterable(df['channels']))).value_counts()
TypeError: 'float' object is not iterable
一个选项是explode
,然后计数值:
out = df['channels'].explode().value_counts()
另一个可能是使用 collections.Counter
。请注意,您的错误表明您在该列中缺少值,因此您可以先删除它们:
from itertools import chain
from collections import Counter
out = pd.Series(Counter(chain.from_iterable(df['channels'].dropna())))