如果匹配来自同一行列的复制值,则比较并循环第二个数据帧中时间间隔的每一行
Compare and loop every row of in between time in a second dataframe, if match copy values from columns of the same row
我有两个DF:
- DF1 有一列时间
MsgTime
13:45:33
14:13:25
15:16:43
16:51:19
- DF2有两列时间,还有两列信息
MsgTime1 | MgsTime2 | Temperature | Humidity
13:40:33 13:50:13 21 45
14:16:43 14:26:43 22 56
16:49:11 16:59:02 32 40
因此,如果 MsgTime
在 DF2 的 MsgTime1
和 MsgTime2
之间,则将 Temperature
和 Humidity
作为输出复制到 DF1。循环应该比较每一行以找到匹配项,如果没有则将其留空。所需的输出应如下所示:
MgsTime | Temperature | Humidity
13:45:33 21 45
14:13:25 NaN NaN
15:16:43 NaN NaN
16:51:19 32 40
我试过像双 for 循环那样做,但它似乎没有用:
for i, row in DF1.iterrows():
for j, row2 in DF2.iterrows():
if (row2['MsgTime1'] <= row['MsgTime']) and (row['MsgTime'] <= row2['MsgTime2']):
row['Temperature'] = row2['Temperature']
row['Humidity'] = row2['Humidity']
else:
row2 += 1
使用 pandas.Interval
和 pandas.to_datetime
的一种方法:
df2.index = df2[["MsgTime1", "MsgTime2"]].apply(lambda x: pd.Interval(*pd.to_datetime(x)), axis=1)
s = pd.to_datetime(df["MsgTime"])
for k in ["Temperature", "Humidity"]:
df[k] = s.map(df2[k])
print(df)
输出:
MsgTime Temperature Humidity
0 13:45:33 21.0 45.0
1 14:13:25 NaN NaN
2 15:16:43 NaN NaN
3 16:51:19 32.0 40.0
一个包piso(pandas区间集操作),为此
提供了一个非常快速的实现
df1 = pd.DataFrame(
pd.to_datetime(["13:45:33", "14:13:25", "15:16:43", "16:51:19"]),
columns=["MsgTime"],
)
df2 = pd.DataFrame(
{
"MsgTime1": pd.to_datetime(["13:40:33", "14:16:43", "16:49:11"]),
"MsgTime2": pd.to_datetime(["13:50:13", "14:26:43", "16:59:02"]),
"Temperature":[21,22,32],
"Humidity":[45,56,40],
}
)
解决方案
创建一个包含 Temperature
和 Humidity
列的数据框,由 pandas.IntervalIndex
索引
df3 = df2[["Temperature", "Humidity"]].set_index(pd.IntervalIndex.from_arrays(df2["MsgTime1"], df2["MsgTime2"]))
df3
看起来像这样
Temperature Humidity
(2021-11-03 13:40:33, 2021-11-03 13:50:13] 21 45
(2021-11-03 14:16:43, 2021-11-03 14:26:43] 22 56
(2021-11-03 16:49:11, 2021-11-03 16:59:02] 32 40
请注意,由于未提供日期组件,因此它假定为今天的日期。您也可以使用 pandas.Timedelta
而不是 pandas.Timestamp
- 该方法的工作方式相同
下次使用piso.lookup
piso.lookup(df3, df1["MsgTime"])
产生
Temperature Humidity
2021-11-03 13:45:33 21.0 45.0
2021-11-03 14:13:25 NaN NaN
2021-11-03 15:16:43 NaN NaN
2021-11-03 16:51:19 32.0 40.0
我有两个DF:
- DF1 有一列时间
MsgTime
13:45:33
14:13:25
15:16:43
16:51:19
- DF2有两列时间,还有两列信息
MsgTime1 | MgsTime2 | Temperature | Humidity
13:40:33 13:50:13 21 45
14:16:43 14:26:43 22 56
16:49:11 16:59:02 32 40
因此,如果 MsgTime
在 DF2 的 MsgTime1
和 MsgTime2
之间,则将 Temperature
和 Humidity
作为输出复制到 DF1。循环应该比较每一行以找到匹配项,如果没有则将其留空。所需的输出应如下所示:
MgsTime | Temperature | Humidity
13:45:33 21 45
14:13:25 NaN NaN
15:16:43 NaN NaN
16:51:19 32 40
我试过像双 for 循环那样做,但它似乎没有用:
for i, row in DF1.iterrows():
for j, row2 in DF2.iterrows():
if (row2['MsgTime1'] <= row['MsgTime']) and (row['MsgTime'] <= row2['MsgTime2']):
row['Temperature'] = row2['Temperature']
row['Humidity'] = row2['Humidity']
else:
row2 += 1
使用 pandas.Interval
和 pandas.to_datetime
的一种方法:
df2.index = df2[["MsgTime1", "MsgTime2"]].apply(lambda x: pd.Interval(*pd.to_datetime(x)), axis=1)
s = pd.to_datetime(df["MsgTime"])
for k in ["Temperature", "Humidity"]:
df[k] = s.map(df2[k])
print(df)
输出:
MsgTime Temperature Humidity
0 13:45:33 21.0 45.0
1 14:13:25 NaN NaN
2 15:16:43 NaN NaN
3 16:51:19 32.0 40.0
一个包piso(pandas区间集操作),为此
提供了一个非常快速的实现df1 = pd.DataFrame(
pd.to_datetime(["13:45:33", "14:13:25", "15:16:43", "16:51:19"]),
columns=["MsgTime"],
)
df2 = pd.DataFrame(
{
"MsgTime1": pd.to_datetime(["13:40:33", "14:16:43", "16:49:11"]),
"MsgTime2": pd.to_datetime(["13:50:13", "14:26:43", "16:59:02"]),
"Temperature":[21,22,32],
"Humidity":[45,56,40],
}
)
解决方案
创建一个包含 Temperature
和 Humidity
列的数据框,由 pandas.IntervalIndex
df3 = df2[["Temperature", "Humidity"]].set_index(pd.IntervalIndex.from_arrays(df2["MsgTime1"], df2["MsgTime2"]))
df3
看起来像这样
Temperature Humidity
(2021-11-03 13:40:33, 2021-11-03 13:50:13] 21 45
(2021-11-03 14:16:43, 2021-11-03 14:26:43] 22 56
(2021-11-03 16:49:11, 2021-11-03 16:59:02] 32 40
请注意,由于未提供日期组件,因此它假定为今天的日期。您也可以使用 pandas.Timedelta
而不是 pandas.Timestamp
- 该方法的工作方式相同
下次使用piso.lookup
piso.lookup(df3, df1["MsgTime"])
产生
Temperature Humidity
2021-11-03 13:45:33 21.0 45.0
2021-11-03 14:13:25 NaN NaN
2021-11-03 15:16:43 NaN NaN
2021-11-03 16:51:19 32.0 40.0