在多索引 Pandas 数据帧的表达式中使用其他变量创建新变量

Question

我有以下多索引 Pandas 数据框：

toy.to_json()
'{"["ISRG","Price"]":{"2004-12-31":10.35,"2005-01-28":10.35,"2005-03-31":14.15,"2005-04-01":14.15,"2005-04-29":14.15,"2005-06-30":15.51,"2005-07-01":15.51,"2005-07-29":15.51,"2005-09-30":20.77,"2005-10-28":20.77},"["ISRG","Price_high"]":{"2004-12-31":13.34,"2005-01-28":13.34,"2005-03-31":16.27,"2005-04-01":16.27,"2005-04-29":16.27,"2005-06-30":17.35,"2005-07-01":17.35,"2005-07-29":17.35,"2005-09-30":25.96,"2005-10-28":25.96},"["ISRG","Price_low"]":{"2004-12-31":7.36,"2005-01-28":7.36,"2005-03-31":12.03,"2005-04-01":12.03,"2005-04-29":12.03,"2005-06-30":13.67,"2005-07-01":13.67,"2005-07-29":13.67,"2005-09-30":15.58,"2005-10-28":15.58},"["EW","Price"]":{"2004-12-31":9.36,"2005-01-28":9.36,"2005-03-31":10.47,"2005-04-01":10.47,"2005-04-29":10.47,"2005-06-30":11.07,"2005-07-01":11.07,"2005-07-29":11.07,"2005-09-30":10.86,"2005-10-28":10.86},"["EW","Price_high"]":{"2004-12-31":10.56,"2005-01-28":10.56,"2005-03-31":11.07,"2005-04-01":11.07,"2005-04-29":11.07,"2005-06-30":11.69,"2005-07-01":11.69,"2005-07-29":11.69,"2005-09-30":11.56,"2005-10-28":11.56},"["EW","Price_low"]":{"2004-12-31":8.15,"2005-01-28":8.15,"2005-03-31":9.87,"2005-04-01":9.87,"2005-04-29":9.87,"2005-06-30":10.46,"2005-07-01":10.46,"2005-07-29":10.46,"2005-09-30":10.16,"2005-10-28":10.16},"["volatility",""]":{"2004-12-31":null,"2005-01-28":null,"2005-03-31":null,"2005-04-01":null,"2005-04-29":null,"2005-06-30":null,"2005-07-01":null,"2005-07-29":null,"2005-09-30":null,"2005-10-28":null}}'

我想用一行代码在第二层（即在 'ISGR' 和 'EW' 下）创建一个名为 'volatility' 的新列，该列将由以下定义表达式：

(100 * (Price_high - Price_low)/Price).round()

我有两个问题： a) 我无法创建新列 b) 我不能分配它

这是我用来创建列但失败的代码：

idx = pd.IndexSlice

100 *( toy.loc[:, idx[:, 'Price_high']]  - toy.loc[:, idx[:, 'Price_low']].div(toy.loc[:, idx[:, 'Price']])).round()

此代码行 returns NaNs:

Answer 1

对于输出 MultiIndex DataFrame 在选定的 DataFrame 中需要相同的 MultiIndex，所以使用 rename:

idx = pd.IndexSlice

Price_high =  toy.loc[:, idx[:, 'Price_high']].rename(columns={'Price_high':'new'})
Price_low = toy.loc[:, idx[:, 'Price_low']].rename(columns={'Price_low':'new'})
Price = toy.loc[:, idx[:, 'Price']].rename(columns={'Price':'new'})
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
            ISRG    EW
             new   new
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0

另一种方法是使用 DataFrame.xs 来避免第二级，因此不使用 MultiIndex DataFrames:

Price_high =  toy.xs('Price_high', axis=1, level=1)
Price_low = toy.xs('Price_low', axis=1, level=1)
Price = toy.xs('Price', axis=1, level=1)
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
            ISRG    EW
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0

然后如果需要 MultiIndex 添加 MultiIndex.from_product:

df4.columns = pd.MultiIndex.from_product([df4.columns, ['new']])
print (df4)
            ISRG    EW
             new   new
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0

在多索引 Pandas 数据帧的表达式中使用其他变量创建新变量

Creating a new variable using other variables in an expression in a multiindexed Pandas dataframe

multi-index

python-3.x

assign

pandas