在多索引 Pandas 数据帧的表达式中使用其他变量创建新变量
Creating a new variable using other variables in an expression in a multiindexed Pandas dataframe
我有以下多索引 Pandas 数据框:
toy.to_json()
'{"["ISRG","Price"]":{"2004-12-31":10.35,"2005-01-28":10.35,"2005-03-31":14.15,"2005-04-01":14.15,"2005-04-29":14.15,"2005-06-30":15.51,"2005-07-01":15.51,"2005-07-29":15.51,"2005-09-30":20.77,"2005-10-28":20.77},"["ISRG","Price_high"]":{"2004-12-31":13.34,"2005-01-28":13.34,"2005-03-31":16.27,"2005-04-01":16.27,"2005-04-29":16.27,"2005-06-30":17.35,"2005-07-01":17.35,"2005-07-29":17.35,"2005-09-30":25.96,"2005-10-28":25.96},"["ISRG","Price_low"]":{"2004-12-31":7.36,"2005-01-28":7.36,"2005-03-31":12.03,"2005-04-01":12.03,"2005-04-29":12.03,"2005-06-30":13.67,"2005-07-01":13.67,"2005-07-29":13.67,"2005-09-30":15.58,"2005-10-28":15.58},"["EW","Price"]":{"2004-12-31":9.36,"2005-01-28":9.36,"2005-03-31":10.47,"2005-04-01":10.47,"2005-04-29":10.47,"2005-06-30":11.07,"2005-07-01":11.07,"2005-07-29":11.07,"2005-09-30":10.86,"2005-10-28":10.86},"["EW","Price_high"]":{"2004-12-31":10.56,"2005-01-28":10.56,"2005-03-31":11.07,"2005-04-01":11.07,"2005-04-29":11.07,"2005-06-30":11.69,"2005-07-01":11.69,"2005-07-29":11.69,"2005-09-30":11.56,"2005-10-28":11.56},"["EW","Price_low"]":{"2004-12-31":8.15,"2005-01-28":8.15,"2005-03-31":9.87,"2005-04-01":9.87,"2005-04-29":9.87,"2005-06-30":10.46,"2005-07-01":10.46,"2005-07-29":10.46,"2005-09-30":10.16,"2005-10-28":10.16},"["volatility",""]":{"2004-12-31":null,"2005-01-28":null,"2005-03-31":null,"2005-04-01":null,"2005-04-29":null,"2005-06-30":null,"2005-07-01":null,"2005-07-29":null,"2005-09-30":null,"2005-10-28":null}}'
我想用一行代码在第二层(即在 'ISGR' 和 'EW' 下)创建一个名为 'volatility' 的新列,该列将由以下定义表达式:
(100 * (Price_high - Price_low)/Price).round()
我有两个问题:
a) 我无法创建新列
b) 我不能分配它
这是我用来创建列但失败的代码:
idx = pd.IndexSlice
100 *( toy.loc[:, idx[:, 'Price_high']] - toy.loc[:, idx[:, 'Price_low']].div(toy.loc[:, idx[:, 'Price']])).round()
此代码行 returns NaNs:
对于输出 MultiIndex DataFrame
在选定的 DataFrame 中需要相同的 MultiIndex
,所以使用 rename
:
idx = pd.IndexSlice
Price_high = toy.loc[:, idx[:, 'Price_high']].rename(columns={'Price_high':'new'})
Price_low = toy.loc[:, idx[:, 'Price_low']].rename(columns={'Price_low':'new'})
Price = toy.loc[:, idx[:, 'Price']].rename(columns={'Price':'new'})
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
ISRG EW
new new
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0
另一种方法是使用 DataFrame.xs
来避免第二级,因此不使用 MultiIndex DataFrames
:
Price_high = toy.xs('Price_high', axis=1, level=1)
Price_low = toy.xs('Price_low', axis=1, level=1)
Price = toy.xs('Price', axis=1, level=1)
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
ISRG EW
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0
然后如果需要 MultiIndex
添加 MultiIndex.from_product
:
df4.columns = pd.MultiIndex.from_product([df4.columns, ['new']])
print (df4)
ISRG EW
new new
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0
我有以下多索引 Pandas 数据框:
toy.to_json()
'{"["ISRG","Price"]":{"2004-12-31":10.35,"2005-01-28":10.35,"2005-03-31":14.15,"2005-04-01":14.15,"2005-04-29":14.15,"2005-06-30":15.51,"2005-07-01":15.51,"2005-07-29":15.51,"2005-09-30":20.77,"2005-10-28":20.77},"["ISRG","Price_high"]":{"2004-12-31":13.34,"2005-01-28":13.34,"2005-03-31":16.27,"2005-04-01":16.27,"2005-04-29":16.27,"2005-06-30":17.35,"2005-07-01":17.35,"2005-07-29":17.35,"2005-09-30":25.96,"2005-10-28":25.96},"["ISRG","Price_low"]":{"2004-12-31":7.36,"2005-01-28":7.36,"2005-03-31":12.03,"2005-04-01":12.03,"2005-04-29":12.03,"2005-06-30":13.67,"2005-07-01":13.67,"2005-07-29":13.67,"2005-09-30":15.58,"2005-10-28":15.58},"["EW","Price"]":{"2004-12-31":9.36,"2005-01-28":9.36,"2005-03-31":10.47,"2005-04-01":10.47,"2005-04-29":10.47,"2005-06-30":11.07,"2005-07-01":11.07,"2005-07-29":11.07,"2005-09-30":10.86,"2005-10-28":10.86},"["EW","Price_high"]":{"2004-12-31":10.56,"2005-01-28":10.56,"2005-03-31":11.07,"2005-04-01":11.07,"2005-04-29":11.07,"2005-06-30":11.69,"2005-07-01":11.69,"2005-07-29":11.69,"2005-09-30":11.56,"2005-10-28":11.56},"["EW","Price_low"]":{"2004-12-31":8.15,"2005-01-28":8.15,"2005-03-31":9.87,"2005-04-01":9.87,"2005-04-29":9.87,"2005-06-30":10.46,"2005-07-01":10.46,"2005-07-29":10.46,"2005-09-30":10.16,"2005-10-28":10.16},"["volatility",""]":{"2004-12-31":null,"2005-01-28":null,"2005-03-31":null,"2005-04-01":null,"2005-04-29":null,"2005-06-30":null,"2005-07-01":null,"2005-07-29":null,"2005-09-30":null,"2005-10-28":null}}'
我想用一行代码在第二层(即在 'ISGR' 和 'EW' 下)创建一个名为 'volatility' 的新列,该列将由以下定义表达式:
(100 * (Price_high - Price_low)/Price).round()
我有两个问题: a) 我无法创建新列 b) 我不能分配它
这是我用来创建列但失败的代码:
idx = pd.IndexSlice
100 *( toy.loc[:, idx[:, 'Price_high']] - toy.loc[:, idx[:, 'Price_low']].div(toy.loc[:, idx[:, 'Price']])).round()
此代码行 returns NaNs:
对于输出 MultiIndex DataFrame
在选定的 DataFrame 中需要相同的 MultiIndex
,所以使用 rename
:
idx = pd.IndexSlice
Price_high = toy.loc[:, idx[:, 'Price_high']].rename(columns={'Price_high':'new'})
Price_low = toy.loc[:, idx[:, 'Price_low']].rename(columns={'Price_low':'new'})
Price = toy.loc[:, idx[:, 'Price']].rename(columns={'Price':'new'})
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
ISRG EW
new new
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0
另一种方法是使用 DataFrame.xs
来避免第二级,因此不使用 MultiIndex DataFrames
:
Price_high = toy.xs('Price_high', axis=1, level=1)
Price_low = toy.xs('Price_low', axis=1, level=1)
Price = toy.xs('Price', axis=1, level=1)
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
ISRG EW
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0
然后如果需要 MultiIndex
添加 MultiIndex.from_product
:
df4.columns = pd.MultiIndex.from_product([df4.columns, ['new']])
print (df4)
ISRG EW
new new
2004-12-31 58.0 26.0
2005-01-28 58.0 26.0
2005-03-31 30.0 11.0
2005-04-01 30.0 11.0
2005-04-29 30.0 11.0
2005-06-30 24.0 11.0
2005-07-01 24.0 11.0
2005-07-29 24.0 11.0
2005-09-30 50.0 13.0
2005-10-28 50.0 13.0