将日期时间索引与日期时间列进行比较,并更改另一列中的相应值
Compare datetime index with a datetime column and change the corresponding value in another column
请考虑以下 CSV 格式的数据框:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Unknown
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Unknown
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,Unknown
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,Unknown
我想将索引 time
中的行与列 spending_time
中的行进行比较,如果它们相等,则替换 sender
中的 Unknown
值column (associated with index time
) by the value of the recipient
(corresponding to the spending_time
column),如下:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Paul
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Me
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,John
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,John
在两个相关列(time
和 spending_time
)上将数据框与其自身合并。
df = df.merge(df[['spending_time', 'recipient']], left_on='time', right_on='spending_time', how='left', suffixes=('', '_y'))
中间结果:
time transaction_hash index value recipient spending_time sender spending_time_y recipient_y
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase NaN NaN
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Unknown 2009-01-12 03:30:25 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Unknown 2009-01-12 06:02:13 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown NaN NaN
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown NaN NaN
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 Unknown 2009-01-12 06:12:16 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 Unknown 2009-01-12 06:34:22 John
现在可以通过将 sender
与 recipient_y
合并来获得新的 sender
列:
df['sender'] = df['recipient_y'].combine_first(df['sender'])
df = df.drop(columns=['spending_time_y', 'recipient_y'])
结果:
time transaction_hash index value recipient spending_time sender
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 John
请考虑以下 CSV 格式的数据框:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Unknown
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Unknown
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,Unknown
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,Unknown
我想将索引 time
中的行与列 spending_time
中的行进行比较,如果它们相等,则替换 sender
中的 Unknown
值column (associated with index time
) by the value of the recipient
(corresponding to the spending_time
column),如下:
time,transaction_hash,index,value,recipient,spending_time,sender
2009-01-09 03:54:39,a0241106ee5a597c9,0,50.0,Paul,2009-01-12 03:30:25,Coinbase
2009-01-12 03:30:25,1338530e9831e9e16,1,40.0,Me,2009-01-12 06:02:13,Paul
2009-01-12 06:02:13,546c4fb42b41e14be,1,30.0,John,2009-01-12 06:12:16,Me
2009-01-12 06:02:14,eaf0775ebca408f7r,1,17.0,Paul,2010-09-22 06:06:02,Unknown
2009-01-12 06:02:15,732a865bf5414eab2,1,15.0,Paul,2010-02-23 14:01:23,Unknown
2009-01-12 06:12:16,591e911da64588073,1,29.0,John,2009-01-12 06:34:22,John
2009-01-12 06:34:22,12b52732a85c191ba,1,28.0,Sara,2009-01-12 20:04:20,John
在两个相关列(time
和 spending_time
)上将数据框与其自身合并。
df = df.merge(df[['spending_time', 'recipient']], left_on='time', right_on='spending_time', how='left', suffixes=('', '_y'))
中间结果:
time transaction_hash index value recipient spending_time sender spending_time_y recipient_y
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase NaN NaN
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Unknown 2009-01-12 03:30:25 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Unknown 2009-01-12 06:02:13 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown NaN NaN
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown NaN NaN
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 Unknown 2009-01-12 06:12:16 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 Unknown 2009-01-12 06:34:22 John
现在可以通过将 sender
与 recipient_y
合并来获得新的 sender
列:
df['sender'] = df['recipient_y'].combine_first(df['sender'])
df = df.drop(columns=['spending_time_y', 'recipient_y'])
结果:
time transaction_hash index value recipient spending_time sender
0 2009-01-09 03:54:39 a0241106ee5a597c9 0 50.0 Paul 2009-01-12 03:30:25 Coinbase
1 2009-01-12 03:30:25 1338530e9831e9e16 1 40.0 Me 2009-01-12 06:02:13 Paul
2 2009-01-12 06:02:13 546c4fb42b41e14be 1 30.0 John 2009-01-12 06:12:16 Me
3 2009-01-12 06:02:14 eaf0775ebca408f7r 1 17.0 Paul 2010-09-22 06:06:02 Unknown
4 2009-01-12 06:02:15 732a865bf5414eab2 1 15.0 Paul 2010-02-23 14:01:23 Unknown
5 2009-01-12 06:12:16 591e911da64588073 1 29.0 John 2009-01-12 06:34:22 John
6 2009-01-12 06:34:22 12b52732a85c191ba 1 28.0 Sara 2009-01-12 20:04:20 John