将列保留在空数据框中的 groupby 之后
Keep columns after a groupby in an empty dataframe
数据帧在 query.when groupby 之后是一个空 df,引发运行时警告,然后得到另一个没有 columns.How 的空数据帧来保留列?
df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"])
print df
结果:
Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []
然后分组:
df = df.groupby(["PlatformCategory","Platform","ResClassName"]).sum()
df = df.reset_index(drop=False,inplace=True)
print df
结果:
有时是 None
有时是空数据框
Empty DataFrame
Columns: []
Index: []
为什么空数据框没有列。
运行时警告:
/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: divide by zero encountered in log
if alpha + beta * ngroups < count * np.log(count):
/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: invalid value encountered in double_scalars
if alpha + beta * ngroups < count * np.log(count):
您需要 as_index=False
和 group_keys=False
:
df = df.groupby(["PlatformCategory","Platform","ResClassName"], as_index=False).count()
df
Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []
之后无需重置索引。
无论数据框是否为空,.sum()
的一些代码都相同:
def groupby_sum(df, groupby_cols):
groupby = df.groupby(groupby_cols, as_index=False)
summed = groupby.sum()
return (groupby.count() if summed.empty else summed).set_index(groupby_cols)
df = groupby_sum(df, ["PlatformCategory", "Platform", "ResClassName"])
数据帧在 query.when groupby 之后是一个空 df,引发运行时警告,然后得到另一个没有 columns.How 的空数据帧来保留列?
df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"])
print df
结果:
Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []
然后分组:
df = df.groupby(["PlatformCategory","Platform","ResClassName"]).sum()
df = df.reset_index(drop=False,inplace=True)
print df
结果: 有时是 None 有时是空数据框
Empty DataFrame
Columns: []
Index: []
为什么空数据框没有列。
运行时警告:
/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: divide by zero encountered in log
if alpha + beta * ngroups < count * np.log(count):
/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: invalid value encountered in double_scalars
if alpha + beta * ngroups < count * np.log(count):
您需要 as_index=False
和 group_keys=False
:
df = df.groupby(["PlatformCategory","Platform","ResClassName"], as_index=False).count()
df
Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []
之后无需重置索引。
无论数据框是否为空,.sum()
的一些代码都相同:
def groupby_sum(df, groupby_cols):
groupby = df.groupby(groupby_cols, as_index=False)
summed = groupby.sum()
return (groupby.count() if summed.empty else summed).set_index(groupby_cols)
df = groupby_sum(df, ["PlatformCategory", "Platform", "ResClassName"])