Python:根据最近的索引将两个DataFrame的列相乘
Python: Multiply columns of two DataFrames base on the nearest index
我有两个数据框:
import pandas as pd
df1 = pd.DataFrame()
df1['time'] = ['2022-01-01 17:03:32', '2022-01-01 17:04:30', '2022-01-01 17:04:32', '2022-01-02 00:12:02',
'2022-01-02 11:23:16', '2022-01-02 18:13:30', '2022-01-02 21:23:52', '2022-01-02 19:43:12']
df1['price'] = [1,2,3,4,5,6,7,8]
df1['ticker'] = ['a','b','a','b','c','c','a','e']
df2 = pd.DataFrame()
df2['time'] = ['2022-01-01 17:03:50', '2022-01-01 17:06:52', '2022-01-01 17:07:02', '2022-01-02 00:17:42',
'2022-01-02 11:18:16', '2022-01-02 18:13:39', '2022-01-02 21:24:12', '2022-01-02 19:43:12']
df2['amount'] = [10,12,13,14,15,16,17,18]
df2['ticker']=['a','b','b','c','d','e','a','c']
df1:
time price ticker
0 2022-01-01 17:03:32 1 a
1 2022-01-01 17:04:30 2 b
2 2022-01-01 17:04:32 3 a
3 2022-01-02 00:12:02 4 b
4 2022-01-02 11:23:16 5 c
5 2022-01-02 18:13:30 6 c
6 2022-01-02 21:23:52 7 a
7 2022-01-02 19:43:12 8 e
df2:
time amount ticker
0 2022-01-01 17:03:50 10 a
1 2022-01-01 17:06:52 12 b
2 2022-01-01 17:07:02 13 b
3 2022-01-02 00:17:42 14 c
4 2022-01-02 11:18:16 15 d
5 2022-01-02 18:13:39 16 e
6 2022-01-02 21:24:12 17 a
7 2022-01-02 19:43:12 18 c
我想做的是根据最近的“时间”将 df1 中的“价格”列乘以 df2 的“金额”列,其中 df1.ticker=df2.ticker 得到像这样:
df:
time balance ticker
2022-01-01 17:03:50 10 a
2022-01-01 17:06:52 24 b
2022-01-02 21:24:12 119 a
....
其中 df['balance']=df1['price']*df2['amount']
如何在没有多个 for loops/if 语句的情况下以 python 方式完成此操作?
将 merge_asof
与 direction='nearest'
一起用于新的 DataFrame
,然后可以创建新列:
df1['time'] = pd.to_datetime(df1['time'])
df2['time'] = pd.to_datetime(df2['time'])
df = pd.merge_asof(df2.sort_values('time'),
df1.sort_values('time'),
on='time',
by='ticker',
direction='nearest')
df['balance'] = df['price']*df['amount']
print (df)
time amount ticker price balance
0 2022-01-01 17:03:50 10 a 1.0 10.0
1 2022-01-01 17:06:52 12 b 2.0 24.0
2 2022-01-01 17:07:02 13 b 2.0 26.0
3 2022-01-02 00:17:42 14 c 5.0 70.0
4 2022-01-02 11:18:16 15 d NaN NaN
5 2022-01-02 18:13:39 16 e 8.0 128.0
6 2022-01-02 19:43:12 18 c 6.0 108.0
7 2022-01-02 21:24:12 17 a 7.0 119.0
- 使用
compare
to get the rows with the same ticker values from df2
and select the data with indexes by using take
tkcmp = df2.ticker.compare(df1.ticker, keep_shape=True)
idx_tk = tkcmp.index[tkcmp.isnull().any(1) == True].tolist()
df = df2.take(idx)
- 将
df.time
从string
转换为datetime
,从df1.amount
获取近点时间的索引
t1 = pd.to_datetime(df1.time)
idx_price = [(t1-i).apply(lambda x: x.total_seconds()).abs().idxmin() for i in pd.to_datetime(df.time)]
- 价格乘以数量
df['balance'] = df.amount * df1.price.take(idx_price).values
df
time amount ticker balance
0 2022-01-01 17:03:50 10 a 10
1 2022-01-01 17:06:52 12 b 36
6 2022-01-02 21:24:12 17 a 119
我有两个数据框:
import pandas as pd
df1 = pd.DataFrame()
df1['time'] = ['2022-01-01 17:03:32', '2022-01-01 17:04:30', '2022-01-01 17:04:32', '2022-01-02 00:12:02',
'2022-01-02 11:23:16', '2022-01-02 18:13:30', '2022-01-02 21:23:52', '2022-01-02 19:43:12']
df1['price'] = [1,2,3,4,5,6,7,8]
df1['ticker'] = ['a','b','a','b','c','c','a','e']
df2 = pd.DataFrame()
df2['time'] = ['2022-01-01 17:03:50', '2022-01-01 17:06:52', '2022-01-01 17:07:02', '2022-01-02 00:17:42',
'2022-01-02 11:18:16', '2022-01-02 18:13:39', '2022-01-02 21:24:12', '2022-01-02 19:43:12']
df2['amount'] = [10,12,13,14,15,16,17,18]
df2['ticker']=['a','b','b','c','d','e','a','c']
df1:
time price ticker
0 2022-01-01 17:03:32 1 a
1 2022-01-01 17:04:30 2 b
2 2022-01-01 17:04:32 3 a
3 2022-01-02 00:12:02 4 b
4 2022-01-02 11:23:16 5 c
5 2022-01-02 18:13:30 6 c
6 2022-01-02 21:23:52 7 a
7 2022-01-02 19:43:12 8 e
df2:
time amount ticker
0 2022-01-01 17:03:50 10 a
1 2022-01-01 17:06:52 12 b
2 2022-01-01 17:07:02 13 b
3 2022-01-02 00:17:42 14 c
4 2022-01-02 11:18:16 15 d
5 2022-01-02 18:13:39 16 e
6 2022-01-02 21:24:12 17 a
7 2022-01-02 19:43:12 18 c
我想做的是根据最近的“时间”将 df1 中的“价格”列乘以 df2 的“金额”列,其中 df1.ticker=df2.ticker 得到像这样:
df:
time balance ticker
2022-01-01 17:03:50 10 a
2022-01-01 17:06:52 24 b
2022-01-02 21:24:12 119 a
....
其中 df['balance']=df1['price']*df2['amount']
如何在没有多个 for loops/if 语句的情况下以 python 方式完成此操作?
将 merge_asof
与 direction='nearest'
一起用于新的 DataFrame
,然后可以创建新列:
df1['time'] = pd.to_datetime(df1['time'])
df2['time'] = pd.to_datetime(df2['time'])
df = pd.merge_asof(df2.sort_values('time'),
df1.sort_values('time'),
on='time',
by='ticker',
direction='nearest')
df['balance'] = df['price']*df['amount']
print (df)
time amount ticker price balance
0 2022-01-01 17:03:50 10 a 1.0 10.0
1 2022-01-01 17:06:52 12 b 2.0 24.0
2 2022-01-01 17:07:02 13 b 2.0 26.0
3 2022-01-02 00:17:42 14 c 5.0 70.0
4 2022-01-02 11:18:16 15 d NaN NaN
5 2022-01-02 18:13:39 16 e 8.0 128.0
6 2022-01-02 19:43:12 18 c 6.0 108.0
7 2022-01-02 21:24:12 17 a 7.0 119.0
- 使用
compare
to get the rows with the same ticker values fromdf2
and select the data with indexes by usingtake
tkcmp = df2.ticker.compare(df1.ticker, keep_shape=True)
idx_tk = tkcmp.index[tkcmp.isnull().any(1) == True].tolist()
df = df2.take(idx)
- 将
df.time
从string
转换为datetime
,从df1.amount
获取近点时间的索引
t1 = pd.to_datetime(df1.time)
idx_price = [(t1-i).apply(lambda x: x.total_seconds()).abs().idxmin() for i in pd.to_datetime(df.time)]
- 价格乘以数量
df['balance'] = df.amount * df1.price.take(idx_price).values
df
time amount ticker balance
0 2022-01-01 17:03:50 10 a 10
1 2022-01-01 17:06:52 12 b 36
6 2022-01-02 21:24:12 17 a 119