Python:根据最近的索引将两个DataFrame的列相乘

Python: Multiply columns of two DataFrames base on the nearest index

我有两个数据框:

import pandas as pd

df1 = pd.DataFrame()
df1['time'] = ['2022-01-01 17:03:32', '2022-01-01 17:04:30', '2022-01-01 17:04:32', '2022-01-02 00:12:02',
        '2022-01-02 11:23:16', '2022-01-02 18:13:30', '2022-01-02 21:23:52', '2022-01-02 19:43:12']
df1['price'] = [1,2,3,4,5,6,7,8]
df1['ticker'] = ['a','b','a','b','c','c','a','e']
df2 = pd.DataFrame()
df2['time'] = ['2022-01-01 17:03:50', '2022-01-01 17:06:52', '2022-01-01 17:07:02', '2022-01-02 00:17:42',
        '2022-01-02 11:18:16', '2022-01-02 18:13:39', '2022-01-02 21:24:12', '2022-01-02 19:43:12']
df2['amount'] = [10,12,13,14,15,16,17,18]
df2['ticker']=['a','b','b','c','d','e','a','c']

df1:
    time                     price  ticker
 0  2022-01-01 17:03:32      1      a
 1  2022-01-01 17:04:30      2      b
 2  2022-01-01 17:04:32      3      a
 3  2022-01-02 00:12:02      4      b
 4  2022-01-02 11:23:16      5      c
 5  2022-01-02 18:13:30      6      c
 6  2022-01-02 21:23:52      7      a
 7  2022-01-02 19:43:12      8      e     

df2:              
    time                     amount  ticker
 0  2022-01-01 17:03:50      10      a
 1  2022-01-01 17:06:52      12      b
 2  2022-01-01 17:07:02      13      b
 3  2022-01-02 00:17:42      14      c
 4  2022-01-02 11:18:16      15      d
 5  2022-01-02 18:13:39      16      e
 6  2022-01-02 21:24:12      17      a
 7  2022-01-02 19:43:12      18      c

我想做的是根据最近的“时间”将 df1 中的“价格”列乘以 df2 的“金额”列,其中 df1.ticker=df2.ticker 得到像这样:

df:
    time                   balance    ticker
    2022-01-01 17:03:50    10         a
    2022-01-01 17:06:52    24         b
    2022-01-02 21:24:12    119        a
    ....

其中 df['balance']=df1['price']*df2['amount']

如何在没有多个 for loops/if 语句的情况下以 python 方式完成此操作?

merge_asofdirection='nearest' 一起用于新的 DataFrame,然后可以创建新列:

df1['time'] = pd.to_datetime(df1['time'])
df2['time'] = pd.to_datetime(df2['time'])

df = pd.merge_asof(df2.sort_values('time'), 
                   df1.sort_values('time'), 
                   on='time', 
                   by='ticker', 
                   direction='nearest')

df['balance'] = df['price']*df['amount']
print (df)
                 time  amount ticker  price  balance
0 2022-01-01 17:03:50      10      a    1.0     10.0
1 2022-01-01 17:06:52      12      b    2.0     24.0
2 2022-01-01 17:07:02      13      b    2.0     26.0
3 2022-01-02 00:17:42      14      c    5.0     70.0
4 2022-01-02 11:18:16      15      d    NaN      NaN
5 2022-01-02 18:13:39      16      e    8.0    128.0
6 2022-01-02 19:43:12      18      c    6.0    108.0
7 2022-01-02 21:24:12      17      a    7.0    119.0
  1. 使用compare to get the rows with the same ticker values from df2 and select the data with indexes by using take
tkcmp = df2.ticker.compare(df1.ticker, keep_shape=True)
idx_tk = tkcmp.index[tkcmp.isnull().any(1) == True].tolist()
df = df2.take(idx)
  1. df.timestring转换为datetime,从df1.amount
  2. 获取近点时间的索引
t1 = pd.to_datetime(df1.time)
idx_price = [(t1-i).apply(lambda x: x.total_seconds()).abs().idxmin() for i in pd.to_datetime(df.time)]

  1. 价格乘以数量
df['balance'] = df.amount * df1.price.take(idx_price).values

df
                  time  amount ticker  balance
0  2022-01-01 17:03:50      10      a       10
1  2022-01-01 17:06:52      12      b       36
6  2022-01-02 21:24:12      17      a      119