在 Pandas groupby 上使用 value_counts 时如何忽略空系列？

Question

我有一个 DataFrame，每一行都有报纸文章的元数据。我想将这些分组为每月的块，然后计算一列的值（称为 type）：

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()

这适用于年度组，但当我尝试按月分组时失败：

ValueError: operands could not be broadcast together with shape (141,) (139,)

我认为这是因为有些月群没有文章。如果我迭代这些组并在每个组上打印 value_counts：

for name, group in monthly_articles:
    print(name, group["type"].value_counts())

我在 2006 年 1 月和 2 月的组中得到空序列：

2005-12-31 00:00:00 positive    1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative    6
positive    5
neutral     1
Name: type, dtype: int64
2006-04-30 00:00:00 negative    11
positive     6
neutral      3
Name: type, dtype: int64

如何在使用 value_counts() 时忽略空组？

我试过 dropna=False 但没有成功。我认为这与 this question.

是同一个问题

Answer 1

你最好给我们数据样本。否则，要指出问题有点困难。从您的代码片段来看，某些月份的 type 数据似乎为空。您可以对分组对象使用 apply 函数，然后调用 unstack 函数。这是对我有用的示例代码，数据是随机生成的

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
Out[75]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11

如果有空值：

In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
    ...: 

In [77]: dfx
Out[77]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11

谢谢。

在 Pandas groupby 上使用 value_counts 时如何忽略空系列？

How can I ignore empty series when using value_counts on a Pandas groupby?

python

python-3.x

pandas

pandas-groupby