Pandas 上一组 min/max
Pandas previous group min/max
在 Pandas 我有这样的数据集:
Value
2005-08-03 23:15:00 10.5
2005-08-03 23:30:00 10.0
2005-08-03 23:45:00 10.0
2005-08-04 00:00:00 10.5
2005-08-04 00:15:00 10.5
2005-08-04 00:30:00 11.0
2005-08-04 00:45:00 10.5
2005-08-04 01:00:00 11.0
...
2005-08-04 23:15:00 14.0
2005-08-04 23:30:00 13.5
2005-08-04 23:45:00 13.0
2005-08-05 00:00:00 13.5
2005-08-05 00:15:00 14.0
2005-08-05 00:30:00 14.0
2005-08-05 00:45:00 14.5
首先我想按日期对数据进行分组并将每组的最大值存储在新列中,我为此任务使用了以下代码:
df['ValueMaxInGroup'] = df.groupby(pd.TimeGrouper('D'))['Value'].transform(max)
现在我想创建另一列来存储前一组最大值,因此所需的数据框如下所示:
Value ValueMaxInGroup ValueMaxInPrevGroup
2005-08-03 23:15:00 10.5 10.5 NaN
2005-08-03 23:30:00 10.0 10.5 NaN
2005-08-03 23:45:00 10.0 10.5 NaN
2005-08-04 00:00:00 10.5 14.0 10.5
2005-08-04 00:15:00 10.5 14.0 10.5
2005-08-04 00:30:00 11.0 14.0 10.5
2005-08-04 00:45:00 10.5 14.0 10.5
2005-08-04 01:00:00 11.0 14.0 10.5
...
2005-08-04 23:15:00 14.0 14.0 10.5
2005-08-04 23:30:00 13.5 14.0 10.5
2005-08-04 23:45:00 13.0 14.0 10.5
2005-08-05 00:00:00 13.5 14.5 14.0
2005-08-05 00:15:00 14.0 14.5 14.0
2005-08-05 00:30:00 14.0 14.5 14.0
2005-08-05 00:45:00 14.5 14.5 14.0
因此,为了简单地获取上一行的值,我使用了
df['ValueInPrevRow'] = df.shift(1)['Value']
有什么方法可以得到另一个组的 min/max/f(x)?我假设
df['ValueMaxInPrevGroup'] = df.groupby(pd.TimeGrouper('D')).shift(1)['Value'].transform(max)
但是没用。
您可以通过使用 groupby/agg
、shift
和 merge
:
得到想要的结果
import numpy as np
import pandas as pd
df = pd.DataFrame({'Value': [10.5, 10.0, 10.0, 10.5, 10.5, 11.0, 10.5, 11.0, 14.0, 13.5, 13.0, 13.5, 14.0, 14.0, 14.5]}, index=['2005-08-03 23:15:00', '2005-08-03 23:30:00', '2005-08-03 23:45:00', '2005-08-04 00:00:00', '2005-08-04 00:15:00', '2005-08-04 00:30:00', '2005-08-04 00:45:00', '2005-08-04 01:00:00', '2005-08-04 23:15:00', '2005-08-04 23:30:00', '2005-08-04 23:45:00', '2005-08-05 00:00:00', '2005-08-05 00:15:00', '2005-08-05 00:30:00', '2005-08-05 00:45:00'])
df.index = pd.DatetimeIndex(df.index)
# This is equivalent to
# df['group'] = pd.to_datetime(df.index.date)
# when freq='D', but the version below works with any freq string, not just `'D'`.
grouped = df.groupby(pd.TimeGrouper('D'))
labels, uniqs, ngroups = grouped.grouper.group_info
df['group'] = grouped.grouper.binlabels[labels]
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
df = pd.merge(df, result, left_on=['group'], right_index=True)
print(df)
产量
Value group Max PreviouMax
2005-08-03 23:15:00 10.5 2005-08-03 10.5 NaN
2005-08-03 23:30:00 10.0 2005-08-03 10.5 NaN
2005-08-03 23:45:00 10.0 2005-08-03 10.5 NaN
2005-08-04 00:00:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:15:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:30:00 11.0 2005-08-04 14.0 10.5
2005-08-04 00:45:00 10.5 2005-08-04 14.0 10.5
2005-08-04 01:00:00 11.0 2005-08-04 14.0 10.5
2005-08-04 23:15:00 14.0 2005-08-04 14.0 10.5
2005-08-04 23:30:00 13.5 2005-08-04 14.0 10.5
2005-08-04 23:45:00 13.0 2005-08-04 14.0 10.5
2005-08-05 00:00:00 13.5 2005-08-05 14.5 14.0
2005-08-05 00:15:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:30:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:45:00 14.5 2005-08-05 14.5 14.0
这里的主要思想是用groupby/agg
代替groupby/transform
这样我们就可以获得
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
# Max PreviouMax
# group
# 2005-08-03 10.5 NaN
# 2005-08-04 14.0 10.5
# 2005-08-05 14.5 14.0
那么想要的DataFrame可以表示为合并df
和
result
group
日期。
在 Pandas 我有这样的数据集:
Value
2005-08-03 23:15:00 10.5
2005-08-03 23:30:00 10.0
2005-08-03 23:45:00 10.0
2005-08-04 00:00:00 10.5
2005-08-04 00:15:00 10.5
2005-08-04 00:30:00 11.0
2005-08-04 00:45:00 10.5
2005-08-04 01:00:00 11.0
...
2005-08-04 23:15:00 14.0
2005-08-04 23:30:00 13.5
2005-08-04 23:45:00 13.0
2005-08-05 00:00:00 13.5
2005-08-05 00:15:00 14.0
2005-08-05 00:30:00 14.0
2005-08-05 00:45:00 14.5
首先我想按日期对数据进行分组并将每组的最大值存储在新列中,我为此任务使用了以下代码:
df['ValueMaxInGroup'] = df.groupby(pd.TimeGrouper('D'))['Value'].transform(max)
现在我想创建另一列来存储前一组最大值,因此所需的数据框如下所示:
Value ValueMaxInGroup ValueMaxInPrevGroup
2005-08-03 23:15:00 10.5 10.5 NaN
2005-08-03 23:30:00 10.0 10.5 NaN
2005-08-03 23:45:00 10.0 10.5 NaN
2005-08-04 00:00:00 10.5 14.0 10.5
2005-08-04 00:15:00 10.5 14.0 10.5
2005-08-04 00:30:00 11.0 14.0 10.5
2005-08-04 00:45:00 10.5 14.0 10.5
2005-08-04 01:00:00 11.0 14.0 10.5
...
2005-08-04 23:15:00 14.0 14.0 10.5
2005-08-04 23:30:00 13.5 14.0 10.5
2005-08-04 23:45:00 13.0 14.0 10.5
2005-08-05 00:00:00 13.5 14.5 14.0
2005-08-05 00:15:00 14.0 14.5 14.0
2005-08-05 00:30:00 14.0 14.5 14.0
2005-08-05 00:45:00 14.5 14.5 14.0
因此,为了简单地获取上一行的值,我使用了
df['ValueInPrevRow'] = df.shift(1)['Value']
有什么方法可以得到另一个组的 min/max/f(x)?我假设
df['ValueMaxInPrevGroup'] = df.groupby(pd.TimeGrouper('D')).shift(1)['Value'].transform(max)
但是没用。
您可以通过使用 groupby/agg
、shift
和 merge
:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Value': [10.5, 10.0, 10.0, 10.5, 10.5, 11.0, 10.5, 11.0, 14.0, 13.5, 13.0, 13.5, 14.0, 14.0, 14.5]}, index=['2005-08-03 23:15:00', '2005-08-03 23:30:00', '2005-08-03 23:45:00', '2005-08-04 00:00:00', '2005-08-04 00:15:00', '2005-08-04 00:30:00', '2005-08-04 00:45:00', '2005-08-04 01:00:00', '2005-08-04 23:15:00', '2005-08-04 23:30:00', '2005-08-04 23:45:00', '2005-08-05 00:00:00', '2005-08-05 00:15:00', '2005-08-05 00:30:00', '2005-08-05 00:45:00'])
df.index = pd.DatetimeIndex(df.index)
# This is equivalent to
# df['group'] = pd.to_datetime(df.index.date)
# when freq='D', but the version below works with any freq string, not just `'D'`.
grouped = df.groupby(pd.TimeGrouper('D'))
labels, uniqs, ngroups = grouped.grouper.group_info
df['group'] = grouped.grouper.binlabels[labels]
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
df = pd.merge(df, result, left_on=['group'], right_index=True)
print(df)
产量
Value group Max PreviouMax
2005-08-03 23:15:00 10.5 2005-08-03 10.5 NaN
2005-08-03 23:30:00 10.0 2005-08-03 10.5 NaN
2005-08-03 23:45:00 10.0 2005-08-03 10.5 NaN
2005-08-04 00:00:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:15:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:30:00 11.0 2005-08-04 14.0 10.5
2005-08-04 00:45:00 10.5 2005-08-04 14.0 10.5
2005-08-04 01:00:00 11.0 2005-08-04 14.0 10.5
2005-08-04 23:15:00 14.0 2005-08-04 14.0 10.5
2005-08-04 23:30:00 13.5 2005-08-04 14.0 10.5
2005-08-04 23:45:00 13.0 2005-08-04 14.0 10.5
2005-08-05 00:00:00 13.5 2005-08-05 14.5 14.0
2005-08-05 00:15:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:30:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:45:00 14.5 2005-08-05 14.5 14.0
这里的主要思想是用groupby/agg
代替groupby/transform
这样我们就可以获得
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
# Max PreviouMax
# group
# 2005-08-03 10.5 NaN
# 2005-08-04 14.0 10.5
# 2005-08-05 14.5 14.0
那么想要的DataFrame可以表示为合并df
和
result
group
日期。