每组连续行的累积变化率,使用 2 列 (x/y)
Cumulative rate of change for successive rows per group, using 2 columns (x/y)
我正在尝试实现一个简单的函数来使用 pandas 计算给定 y
值的变化率。
我的输入 DataFrame 如下:
data = [
["id1", "2018-05-14", 9998, 10],
["id1", "2019-05-14", 18000, 5],
["id1", "2020-05-14", 22000, 3],
["id2", "2018-07-3", 458756, 24],
["id2", "2019-02-1", 822565, 3],
["id2", "2020-05-14", 922565, 1],
]
df = pd.DataFrame(data, columns=["id", "date", "x", "y"])
预期输出如下:
data = [
["id1", "2018-05-14", 9998, 10, np.nan],
["id1", "2019-05-14", 18000, 5, 1600.4], # (18000-9998) / (10-5)
["id1", "2020-05-14", 22000, 3, 2000], # (22000-18000) (5-3)
["id2", "2018-07-3", 458756, 24, np.nan],
["id2", "2019-02-1", 822565, 3, 17324.24], # (822565-458756) / (24-3)
["id2", "2020-05-14", 922565, 1, 50000], # (922565-822565) / (3-1)
]
df_expected = pd.DataFrame(data, columns=["id", "date", "x", "y", "rate"])
我想我需要使用滚动功能,但我不确定必须如何使用它
def compute_rate(vals):
return vals["x"] / vals["y"]
df.sort_values(by=["id", "date"], inplace=True)
grp = df.groupby(by=["id"])
df["rate"] = grp.rolling(1).apply(
lambda x: x["x"] / x["y"]
)
最高效,用 groupby
+ diff
计算两行的差异,然后在 eval
的帮助下计算比率(或者 assign
如果你想效率更高):
df['rate'] = (
df.sort_values(by=['id', 'date'])
.groupby('id')[['x', 'y']]
.diff().eval('-x/y')
)
输出:
id date x y rate
0 id1 2018-05-14 9998 10 NaN
1 id1 2019-05-14 18000 5 1600.400000
2 id1 2020-05-14 22000 3 2000.000000
3 id2 2018-07-3 458756 24 NaN
4 id2 2019-02-1 822565 3 17324.238095
5 id2 2020-05-14 922565 1 50000.000000
我正在尝试实现一个简单的函数来使用 pandas 计算给定 y
值的变化率。
我的输入 DataFrame 如下:
data = [
["id1", "2018-05-14", 9998, 10],
["id1", "2019-05-14", 18000, 5],
["id1", "2020-05-14", 22000, 3],
["id2", "2018-07-3", 458756, 24],
["id2", "2019-02-1", 822565, 3],
["id2", "2020-05-14", 922565, 1],
]
df = pd.DataFrame(data, columns=["id", "date", "x", "y"])
预期输出如下:
data = [
["id1", "2018-05-14", 9998, 10, np.nan],
["id1", "2019-05-14", 18000, 5, 1600.4], # (18000-9998) / (10-5)
["id1", "2020-05-14", 22000, 3, 2000], # (22000-18000) (5-3)
["id2", "2018-07-3", 458756, 24, np.nan],
["id2", "2019-02-1", 822565, 3, 17324.24], # (822565-458756) / (24-3)
["id2", "2020-05-14", 922565, 1, 50000], # (922565-822565) / (3-1)
]
df_expected = pd.DataFrame(data, columns=["id", "date", "x", "y", "rate"])
我想我需要使用滚动功能,但我不确定必须如何使用它
def compute_rate(vals):
return vals["x"] / vals["y"]
df.sort_values(by=["id", "date"], inplace=True)
grp = df.groupby(by=["id"])
df["rate"] = grp.rolling(1).apply(
lambda x: x["x"] / x["y"]
)
最高效,用 groupby
+ diff
计算两行的差异,然后在 eval
的帮助下计算比率(或者 assign
如果你想效率更高):
df['rate'] = (
df.sort_values(by=['id', 'date'])
.groupby('id')[['x', 'y']]
.diff().eval('-x/y')
)
输出:
id date x y rate
0 id1 2018-05-14 9998 10 NaN
1 id1 2019-05-14 18000 5 1600.400000
2 id1 2020-05-14 22000 3 2000.000000
3 id2 2018-07-3 458756 24 NaN
4 id2 2019-02-1 822565 3 17324.238095
5 id2 2020-05-14 922565 1 50000.000000