如果匹配来自同一行列的复制值,则比较并循环第二个数据帧中时间间隔的每一行

Compare and loop every row of in between time in a second dataframe, if match copy values from columns of the same row

我有两个DF:

MsgTime 
13:45:33
14:13:25
15:16:43
16:51:19
MsgTime1 | MgsTime2 | Temperature | Humidity
13:40:33   13:50:13    21           45
14:16:43   14:26:43    22           56
16:49:11   16:59:02    32           40

因此,如果 MsgTime 在 DF2 的 MsgTime1MsgTime2 之间,则将 TemperatureHumidity 作为输出复制到 DF1。循环应该比较每一行以找到匹配项,如果没有则将其留空。所需的输出应如下所示:

MgsTime | Temperature | Humidity
13:45:33   21            45
14:13:25   NaN           NaN
15:16:43   NaN           NaN
16:51:19    32           40

我试过像双 for 循环那样做,但它似乎没有用:

for i, row in DF1.iterrows():
    for j, row2 in DF2.iterrows():
        if (row2['MsgTime1'] <= row['MsgTime']) and (row['MsgTime'] <=  row2['MsgTime2']):
            row['Temperature'] = row2['Temperature']
            row['Humidity'] = row2['Humidity']
        else:
            row2 += 1

使用 pandas.Intervalpandas.to_datetime 的一种方法:

df2.index = df2[["MsgTime1", "MsgTime2"]].apply(lambda x: pd.Interval(*pd.to_datetime(x)), axis=1)

s = pd.to_datetime(df["MsgTime"])
for k in ["Temperature", "Humidity"]:
    df[k] = s.map(df2[k])
print(df)

输出:

    MsgTime  Temperature  Humidity
0  13:45:33         21.0      45.0
1  14:13:25          NaN       NaN
2  15:16:43          NaN       NaN
3  16:51:19         32.0      40.0

一个包piso(pandas区间集操作),为此

提供了一个非常快速的实现
df1 = pd.DataFrame(
    pd.to_datetime(["13:45:33", "14:13:25", "15:16:43", "16:51:19"]),
    columns=["MsgTime"],
)

df2 = pd.DataFrame(
    {
        "MsgTime1": pd.to_datetime(["13:40:33", "14:16:43", "16:49:11"]),
        "MsgTime2": pd.to_datetime(["13:50:13", "14:26:43", "16:59:02"]),
        "Temperature":[21,22,32],
        "Humidity":[45,56,40],
    }
)

解决方案

创建一个包含 TemperatureHumidity 列的数据框,由 pandas.IntervalIndex

索引
df3 = df2[["Temperature", "Humidity"]].set_index(pd.IntervalIndex.from_arrays(df2["MsgTime1"], df2["MsgTime2"]))

df3 看起来像这样

                                            Temperature  Humidity
(2021-11-03 13:40:33, 2021-11-03 13:50:13]           21        45
(2021-11-03 14:16:43, 2021-11-03 14:26:43]           22        56
(2021-11-03 16:49:11, 2021-11-03 16:59:02]           32        40

请注意,由于未提供日期组件,因此它假定为今天的日期。您也可以使用 pandas.Timedelta 而不是 pandas.Timestamp - 该方法的工作方式相同

下次使用piso.lookup

piso.lookup(df3, df1["MsgTime"])

产生

                     Temperature  Humidity
2021-11-03 13:45:33         21.0      45.0
2021-11-03 14:13:25          NaN       NaN
2021-11-03 15:16:43          NaN       NaN
2021-11-03 16:51:19         32.0      40.0