Pandas 矢量化 - 在另一个 DataFrame 中寻找最近的未来时间
Pandas vectorization - finding nearest future time in another DataFrame
假设我有两个 pandas 时间序列数据帧:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9],index=pd.date_range('2020-01-01', periods=9, freq='10min'))
b = pd.DataFrame(['a','b','c'],index=pd.date_range('2020-01-01', periods=3, freq='30min'))
有没有办法使用 pandas 对以下算法进行向量化?
next_b = []
for row in a.itertuples():
df = b[b.index >= row.Index]
if len(df.index):
next_b.append(df.index[0])
else:
next_b.append(None)
a['next_b'] = next_b
输出:
0 next_b
2020-01-01 00:00:00 1 2020-01-01 00:00:00
2020-01-01 00:10:00 2 2020-01-01 00:30:00
2020-01-01 00:20:00 3 2020-01-01 00:30:00
2020-01-01 00:30:00 4 2020-01-01 00:30:00
2020-01-01 00:40:00 5 2020-01-01 01:00:00
2020-01-01 00:50:00 6 2020-01-01 01:00:00
2020-01-01 01:00:00 7 2020-01-01 01:00:00
2020-01-01 01:10:00 8 NaT
2020-01-01 01:20:00 9 NaT
用merge_asof
,你可以做到
res = pd.merge_asof(a, b.assign(next_b=b.index),
left_index=True, right_index=True,
direction='forward', suffixes=('','_b'))
print(res)
# 0 0_b next_b
# 2020-01-01 00:00:00 1 a 2020-01-01 00:00:00
# 2020-01-01 00:10:00 2 b 2020-01-01 00:30:00
# 2020-01-01 00:20:00 3 b 2020-01-01 00:30:00
# 2020-01-01 00:30:00 4 b 2020-01-01 00:30:00
# 2020-01-01 00:40:00 5 c 2020-01-01 01:00:00
# 2020-01-01 00:50:00 6 c 2020-01-01 01:00:00
# 2020-01-01 01:00:00 7 c 2020-01-01 01:00:00
# 2020-01-01 01:10:00 8 NaN NaT
# 2020-01-01 01:20:00 9 NaN NaT
假设我有两个 pandas 时间序列数据帧:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9],index=pd.date_range('2020-01-01', periods=9, freq='10min'))
b = pd.DataFrame(['a','b','c'],index=pd.date_range('2020-01-01', periods=3, freq='30min'))
有没有办法使用 pandas 对以下算法进行向量化?
next_b = []
for row in a.itertuples():
df = b[b.index >= row.Index]
if len(df.index):
next_b.append(df.index[0])
else:
next_b.append(None)
a['next_b'] = next_b
输出:
0 next_b
2020-01-01 00:00:00 1 2020-01-01 00:00:00
2020-01-01 00:10:00 2 2020-01-01 00:30:00
2020-01-01 00:20:00 3 2020-01-01 00:30:00
2020-01-01 00:30:00 4 2020-01-01 00:30:00
2020-01-01 00:40:00 5 2020-01-01 01:00:00
2020-01-01 00:50:00 6 2020-01-01 01:00:00
2020-01-01 01:00:00 7 2020-01-01 01:00:00
2020-01-01 01:10:00 8 NaT
2020-01-01 01:20:00 9 NaT
用merge_asof
,你可以做到
res = pd.merge_asof(a, b.assign(next_b=b.index),
left_index=True, right_index=True,
direction='forward', suffixes=('','_b'))
print(res)
# 0 0_b next_b
# 2020-01-01 00:00:00 1 a 2020-01-01 00:00:00
# 2020-01-01 00:10:00 2 b 2020-01-01 00:30:00
# 2020-01-01 00:20:00 3 b 2020-01-01 00:30:00
# 2020-01-01 00:30:00 4 b 2020-01-01 00:30:00
# 2020-01-01 00:40:00 5 c 2020-01-01 01:00:00
# 2020-01-01 00:50:00 6 c 2020-01-01 01:00:00
# 2020-01-01 01:00:00 7 c 2020-01-01 01:00:00
# 2020-01-01 01:10:00 8 NaN NaT
# 2020-01-01 01:20:00 9 NaN NaT