Pandas 取与上一行的差值的比率并将值存储在另一列中,具有多索引
Pandas taking ratio of difference from the row above and store the value in another column, with multi-index
我想知道如何获取具有多索引列的两行之间的差异比率,并将它们存储在特定列中。
我有一个看起来像这样的数据框。
>>>df
A B C
total diff total diff total diff
2020-08-15 100 0 200 0 20 0
每天,我都会添加一个新行。新行看起来像这样。
df_new
A B C
total diff total diff total diff
2020-08-16 200 - 50 - 30 -
对于 diff
列,我想从上面的行中获取 total
的比率。所以公式将是 ([total of today] - [total of the day before]) / [total of the day before]
A B C
total diff total diff total diff
2020-08-15 100 0 200 0 20 0
2020-08-16 200 1.0 50 -0.75 30 0.5
我知道如何添加新行。
day = dt.today()
df.loc[day.strftime("%Y-%m-%d"), :] = df_new.squeeze()
但我不知道如何区分具有多索引列的两行...任何帮助将不胜感激!谢谢。
使用shift
计算结果并更新原来的df:
s = df.filter(like="total").rename(columns={"total":"diff"}, level=1)
res = ((s - s.shift(1))/s.shift(1))
df.update(res)
print (df)
A B C
total diff total diff total diff
2020-08-15 100 0.0 200 0.00 20 0.0
2020-08-16 200 1.0 50 -0.75 30 0.5
您可以使用 df.xs
and use pd.IndexSlice
更新 MultiIndexed 值。
#df
# A B C
# total diff total diff total diff
#0 100 0 200 0 20 0
#df2
# A B C
# total diff total diff total diff
#0 200.0 NaN 50.0 NaN 30.0 NaN
# Take last row of current DataFrame i.e. `df`
curr = df.iloc[-1].xs('total', level=1) #Get total values
# Take total values of new DataFrame you get everyday i.e. `df2`
new = df2.iloc[0].xs('total',level=1)
# Calculate diff values
diffs = new.sub(curr).div(curr) # This is equal to `(new-curr)/curr`
idx = pd.IndexSlice
x = pd.concat([df, df2]).reset_index(drop=True)
x.loc[x.index[-1], idx[:,'diff']] = diffs.tolist()
x
A B C
total diff total diff total diff
0 100.0 0.0 200.0 0.00 20.0 0.0
1 200.0 1.0 50.0 -0.75 30.0 0.5
如果您不想创建新的 DataFrame(x
),则使用 DataFrame.append
附加值。
在步骤 idx = pd.IndexSlice
之前,一切都是相同的用途,不要创建 x
,而是将值附加到 df
df2.loc[:, idx[:,'diff']] = diffs.tolist()
df.append(df2)
A B C
total diff total diff total diff
0 100.0 0.0 200.0 0.00 20.0 0.0
0 200.0 1.0 50.0 -0.75 30.0 0.5
我想知道如何获取具有多索引列的两行之间的差异比率,并将它们存储在特定列中。
我有一个看起来像这样的数据框。
>>>df
A B C
total diff total diff total diff
2020-08-15 100 0 200 0 20 0
每天,我都会添加一个新行。新行看起来像这样。
df_new
A B C
total diff total diff total diff
2020-08-16 200 - 50 - 30 -
对于 diff
列,我想从上面的行中获取 total
的比率。所以公式将是 ([total of today] - [total of the day before]) / [total of the day before]
A B C
total diff total diff total diff
2020-08-15 100 0 200 0 20 0
2020-08-16 200 1.0 50 -0.75 30 0.5
我知道如何添加新行。
day = dt.today()
df.loc[day.strftime("%Y-%m-%d"), :] = df_new.squeeze()
但我不知道如何区分具有多索引列的两行...任何帮助将不胜感激!谢谢。
使用shift
计算结果并更新原来的df:
s = df.filter(like="total").rename(columns={"total":"diff"}, level=1)
res = ((s - s.shift(1))/s.shift(1))
df.update(res)
print (df)
A B C
total diff total diff total diff
2020-08-15 100 0.0 200 0.00 20 0.0
2020-08-16 200 1.0 50 -0.75 30 0.5
您可以使用 df.xs
and use pd.IndexSlice
更新 MultiIndexed 值。
#df
# A B C
# total diff total diff total diff
#0 100 0 200 0 20 0
#df2
# A B C
# total diff total diff total diff
#0 200.0 NaN 50.0 NaN 30.0 NaN
# Take last row of current DataFrame i.e. `df`
curr = df.iloc[-1].xs('total', level=1) #Get total values
# Take total values of new DataFrame you get everyday i.e. `df2`
new = df2.iloc[0].xs('total',level=1)
# Calculate diff values
diffs = new.sub(curr).div(curr) # This is equal to `(new-curr)/curr`
idx = pd.IndexSlice
x = pd.concat([df, df2]).reset_index(drop=True)
x.loc[x.index[-1], idx[:,'diff']] = diffs.tolist()
x
A B C
total diff total diff total diff
0 100.0 0.0 200.0 0.00 20.0 0.0
1 200.0 1.0 50.0 -0.75 30.0 0.5
如果您不想创建新的 DataFrame(x
),则使用 DataFrame.append
附加值。
在步骤 idx = pd.IndexSlice
之前,一切都是相同的用途,不要创建 x
,而是将值附加到 df
df2.loc[:, idx[:,'diff']] = diffs.tolist()
df.append(df2)
A B C
total diff total diff total diff
0 100.0 0.0 200.0 0.00 20.0 0.0
0 200.0 1.0 50.0 -0.75 30.0 0.5