将列保留在空数据框中的 groupby 之后

Keep columns after a groupby in an empty dataframe

数据帧在 query.when groupby 之后是一个空 df,引发运行时警告,然后得到另一个没有 columns.How 的空数据帧来保留列?

df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"])
print df

结果:

Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []

然后分组:

df = df.groupby(["PlatformCategory","Platform","ResClassName"]).sum()
df = df.reset_index(drop=False,inplace=True)
print df

结果: 有时是 None 有时是空数据框

Empty DataFrame
Columns: []
Index: []

为什么空数据框没有列。

运行时警告:

/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: divide by zero encountered in log

if alpha + beta * ngroups < count * np.log(count):

/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: invalid value encountered in double_scalars
  if alpha + beta * ngroups < count * np.log(count):

您需要 as_index=Falsegroup_keys=False:

df = df.groupby(["PlatformCategory","Platform","ResClassName"], as_index=False).count()
df

Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []

之后无需重置索引。

无论数据框是否为空,.sum() 的一些代码都相同:

def groupby_sum(df, groupby_cols):
    groupby = df.groupby(groupby_cols, as_index=False)
    summed = groupby.sum()
    return (groupby.count() if summed.empty else summed).set_index(groupby_cols)

df = groupby_sum(df, ["PlatformCategory", "Platform", "ResClassName"])