将 Pandas 列的每个单元格从列表转换为字数字典？

Question

DataFrame 有一列，df['Title']，其中每一行都是在某个位置 LOCATION_ID 出售的一本书。我想按 LOCATION_ID 对 df 进行分组，并创建一个包含两列的新 DataFrame：LOCATION_ID 和每个位置销售的书籍的 Title-Count 字典。

具体来说，我正在尝试执行以下操作：

from collections import Counter
new_df = df.groupby('LOCATION_ID')['TITLE'].apply(lambda x: Counter(x))

我期待这样的输出：

LOCATION_ID  |     TITLES
1                 {'TitleA':12; 'TitleB':56 ; ...}
2                 {'TitleK':5; 'TitleC':23 ; ...}
...

但是，我收到的是这样的：

LOCATION_ID                         Title                             
1               TitleA               12
                TitleB               56
...
2               TitleK              5
                TitleG              23
...

感谢您的帮助。

Answer 1

使用agg代替apply:

import numpy as np
import pandas as pd
from collections import Counter
prng = np.random.RandomState(0)
df = pd.DataFrame({'LOCATION_ID': prng.choice([1, 2, 3], 1000), 'TITLE': [''.join(prng.choice(list("abcd"), 3)) for _ in range(1000)]})
df.head()
Out[9]: 
   LOCATION_ID TITLE
0            1   bbb
1            2   bab
2            1   daa
3            2   dcd
4            2   cbc

df.groupby('LOCATION_ID')['TITLE'].apply(lambda x: Counter(x)).head()
Out[10]: 
LOCATION_ID     
1            aaa    2.0
             aab    5.0
             aac    4.0
             aad    3.0
             aba    8.0
dtype: float64

df.groupby('LOCATION_ID')['TITLE'].agg(lambda x: Counter(x))
Out[11]: 
LOCATION_ID
1    {u'cbb': 5, u'cbc': 8, u'cba': 6, u'cda': 8, u...
2    {u'cdd': 5, u'cbc': 7, u'cbb': 1, u'cba': 4, u...
3    {u'cbb': 6, u'cbc': 7, u'cba': 4, u'cda': 6, u...
Name: TITLE, dtype: object

您的期望是有道理的。当您将项目分组在一起时，您会期望 pandas 到 return 分组结果。但是，groupby.apply 记录为 flexible apply。基于 returned 对象，它推断如何组合结果。在这里，它看到一个字典并为您提供更好的输出，它创建了一个多索引系列。

将 Pandas 列的每个单元格从列表转换为字数字典？

Convert each cell of a Pandas's column from list to word-count dictionary?

python

counter

dictionary

pandas