Python 长格式:减去选择的行
Python long format: subtract selection of rows
全部,
我有以下长格式数据框:
df = pd.DataFrame({'date': ["2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-01","2020-01-01","2020-01-02","2020-01-02"], 'asset': ["x", "x","x", "x","y","y","y","y"], 'type': ["price", "spread","price","spread","price", "spread","price","spread"], 'value': ["1.5", "0.01","1.6", "0.01","1","0.08","1.2","0.09"]})
看起来像这样:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
我想从 x
的价格中减去 y
的价格并保持相同的数据结构,结果应该如下所示:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4
我想使用 pandas 的 assign()
函数来创建它,但我不知道如何做。
提前致谢!
假设不需要匹配日期并且数据集定义如示例中,您可以执行以下操作:
df2 = pd.DataFrame(df1[df1["asset"] == "x" & df1["type"] == "price"]["value"].reset_index()["value"].astype(float) - df1[df1["asset"] == "y" & df1["type"] == "price"]["value"].reset_index()["value"].astype(float))
df2["date"] = df1[df1["asset"] == "x"]["date"]
df2["type"] = df1[df1["asset"] == "x"]["type"]
df2["asset"] = "x_min_y"
pd.concat([df1,df2])
基本上,执行计算并在之后连接
使用:
m = df['type'].eq('price') & df['asset'].isin(['x', 'y'])
d = df[m].pivot('date', 'asset', 'value').astype(float)
d = pd.concat(
[df, d['x'].sub(d['y']).reset_index(name='value').assign(
asset='x_min_y', type='pricediff')],
ignore_index=True)
详情:
创建一个布尔掩码 m
来过滤 type
是 price
并且 asset
在 x, y
中的行并使用 DataFrame.pivot
重塑数据框:
print(d) # pivoted dataframe
asset x y
date
2020-01-01 1.5 1.0
2020-01-02 1.6 1.2
使用 Series.sub
to subtract column x
from y
in the pivoted dataframe and assign the columns asset
and type
, then use pd.concat
将此旋转数据帧与原始数据帧连接 df
。
print(d)
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4
全部,
我有以下长格式数据框:
df = pd.DataFrame({'date': ["2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-01","2020-01-01","2020-01-02","2020-01-02"], 'asset': ["x", "x","x", "x","y","y","y","y"], 'type': ["price", "spread","price","spread","price", "spread","price","spread"], 'value': ["1.5", "0.01","1.6", "0.01","1","0.08","1.2","0.09"]})
看起来像这样:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
我想从 x
的价格中减去 y
的价格并保持相同的数据结构,结果应该如下所示:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4
我想使用 pandas 的 assign()
函数来创建它,但我不知道如何做。
提前致谢!
假设不需要匹配日期并且数据集定义如示例中,您可以执行以下操作:
df2 = pd.DataFrame(df1[df1["asset"] == "x" & df1["type"] == "price"]["value"].reset_index()["value"].astype(float) - df1[df1["asset"] == "y" & df1["type"] == "price"]["value"].reset_index()["value"].astype(float))
df2["date"] = df1[df1["asset"] == "x"]["date"]
df2["type"] = df1[df1["asset"] == "x"]["type"]
df2["asset"] = "x_min_y"
pd.concat([df1,df2])
基本上,执行计算并在之后连接
使用:
m = df['type'].eq('price') & df['asset'].isin(['x', 'y'])
d = df[m].pivot('date', 'asset', 'value').astype(float)
d = pd.concat(
[df, d['x'].sub(d['y']).reset_index(name='value').assign(
asset='x_min_y', type='pricediff')],
ignore_index=True)
详情:
创建一个布尔掩码 m
来过滤 type
是 price
并且 asset
在 x, y
中的行并使用 DataFrame.pivot
重塑数据框:
print(d) # pivoted dataframe
asset x y
date
2020-01-01 1.5 1.0
2020-01-02 1.6 1.2
使用 Series.sub
to subtract column x
from y
in the pivoted dataframe and assign the columns asset
and type
, then use pd.concat
将此旋转数据帧与原始数据帧连接 df
。
print(d)
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4