将日期时间索引与日期时间列进行比较,并更改另一列中的相应值

Compare datetime index with a datetime column and change the corresponding value in another column

请考虑以下 CSV 格式的数据框:

time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Unknown
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Unknown
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,Unknown
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,Unknown

我想将索引 time 中的行与列 spending_time 中的行进行比较,如果它们相等,则替换 sender 中的 Unknown 值column (associated with index time) by the value of the recipient (corresponding to the spending_time column),如下:

time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Paul
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Me
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,John
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,John

在两个相关列(timespending_time)上将数据框与其自身合并。

df = df.merge(df[['spending_time', 'recipient']], left_on='time', right_on='spending_time', how='left', suffixes=('', '_y'))

中间结果:

                  time   transaction_hash  index  value recipient        spending_time    sender      spending_time_y recipient_y
0  2009-01-09 03:54:39  a0241106ee5a597c9      0   50.0      Paul  2009-01-12 03:30:25  Coinbase                  NaN         NaN
1  2009-01-12 03:30:25  1338530e9831e9e16      1   40.0        Me  2009-01-12 06:02:13   Unknown  2009-01-12 03:30:25        Paul
2  2009-01-12 06:02:13  546c4fb42b41e14be      1   30.0      John  2009-01-12 06:12:16   Unknown  2009-01-12 06:02:13          Me
3  2009-01-12 06:02:14  eaf0775ebca408f7r      1   17.0      Paul  2010-09-22 06:06:02   Unknown                  NaN         NaN
4  2009-01-12 06:02:15  732a865bf5414eab2      1   15.0      Paul  2010-02-23 14:01:23   Unknown                  NaN         NaN
5  2009-01-12 06:12:16  591e911da64588073      1   29.0      John  2009-01-12 06:34:22   Unknown  2009-01-12 06:12:16        John
6  2009-01-12 06:34:22  12b52732a85c191ba      1   28.0      Sara  2009-01-12 20:04:20   Unknown  2009-01-12 06:34:22        John

现在可以通过将 senderrecipient_y 合并来获得新的 sender 列:

df['sender'] = df['recipient_y'].combine_first(df['sender'])
df = df.drop(columns=['spending_time_y', 'recipient_y'])

结果:

                  time   transaction_hash  index  value recipient        spending_time    sender
0  2009-01-09 03:54:39  a0241106ee5a597c9      0   50.0      Paul  2009-01-12 03:30:25  Coinbase
1  2009-01-12 03:30:25  1338530e9831e9e16      1   40.0        Me  2009-01-12 06:02:13      Paul
2  2009-01-12 06:02:13  546c4fb42b41e14be      1   30.0      John  2009-01-12 06:12:16        Me
3  2009-01-12 06:02:14  eaf0775ebca408f7r      1   17.0      Paul  2010-09-22 06:06:02   Unknown
4  2009-01-12 06:02:15  732a865bf5414eab2      1   15.0      Paul  2010-02-23 14:01:23   Unknown
5  2009-01-12 06:12:16  591e911da64588073      1   29.0      John  2009-01-12 06:34:22      John
6  2009-01-12 06:34:22  12b52732a85c191ba      1   28.0      Sara  2009-01-12 20:04:20      John