如何将不同的数据集与日期时间索引合并?

How to merge different datasets with datetime index?

我有两个数据集(Lots 和 Measurements),它们都有日期时间索引,但长度和列不同。第一个数据集 (Lots) 的结构如下:

Datetime Index Lot Group Lot No Booking Level
2013-08-03 10:00:00 1 261291.0 PROB1H
2013-08-03 12:00:00 1 261228.0 PROB1H

另一个(测量)的结构如下:

Datetime Index MID Passed? Measurement1 Measurement2 Measurement3
2013-08-28 10:00:00 12345 True 46.908 3.89 29.056
2013-08-03 12:00:00 78262 True 89.457 6.88 34.918

我想做的是合并日期时间索引上的两个数据帧并获取两个数据帧中的所有列,如果日期时间索引上有匹配项,它会添加 MID,通过了吗?和测量列添加到 Lots 数据框并且还会保留重复项(如果有)并且还会保留缺失值作为 NaN,例如:

假设日期时间 2013-08-28 10:00:00 不存在于 Lots 数据框中,但存在于 Measurement 数据框中,因此会产生:

Datetime Index Lot Group Lot No Booking Level MID Passed? Measurement1 Measurement2 Measurement3
2013-08-28 10:00:00 NaN NaN NaN 12345 True 46.908 3.89 29.056

如果在日期时间 2013-08-03 12:00:00 中存在匹配项,它将产生:

Datetime Index Lot Group Lot No Booking Level MID Passed? Measurement1 Measurement2 Measurement3
2013-08-03 12:00:00 1 261228.0 PROB1H 78262 True 89.457 6.88 34.918

Lots 数据框的日期时间索引只有唯一的日期时间值,但 Measurement 数据框有重复条目,因此如果与重复条目匹配,我想获取重复行,例如:

假设日期时间 2021-04-15 22:00:00 存在于两个数据帧中,但在测量数据帧中多次被发现,因此它会产生以下内容:

Datetime Index Lot Group Lot No Booking Level MID Passed? Measurement1 Measurement2 Measurement3
2021-04-15 22:00:00 2 311000.0 PROB2H 34903 True 39 67 50
2021-04-15 22:00:00 2 311000.0 PROB2H 34904 True 88 40.90 54.38

我试过不同的合并但无法得到我想要的结果我试过:

test = lots.merge(measurement, how = "right",left_index=True, right_index=True)
test2 = lots.merge(measurement, how = "outer",left_index=True, right_index=True)

你建议我怎么做,提前致谢。

您可以使用 join 以及 merge:

# Dataset Lots
>>> dfL
                     Lot Group    Lot No Booking Level
Datetime Index                                        
2013-08-03 10:00:00          1  261291.0        PROB1H
2013-08-03 12:00:00          1  261228.0        PROB1H
2021-04-15 22:00:00          2  311000.0        PROB2H

# Dataset Measurements
>>> dfM
                       MID  Passed?  Measurement1  Measurement2  Measurement3
Datetime Index                                                               
2013-08-28 10:00:00  12345     True        46.908          3.89        29.056
2013-08-03 12:00:00  78262     True        89.457          6.88        34.918
2021-04-15 22:00:00  34903     True        39.000         67.00        50.000
2021-04-15 22:00:00  34904     True        88.000         40.90        54.380

# Join version
>>> dfL.join(dfM, how='outer')
                     Datetime Index  Lot Group Lot No Booking Level      MID Passed?  Measurement1  Measurement2  Measurement3
2013-08-03 10:00:00             1.0   261291.0               PROB1H      NaN     NaN           NaN           NaN           NaN
2013-08-03 12:00:00             1.0   261228.0               PROB1H  78262.0    True        89.457          6.88        34.918
2013-08-28 10:00:00             NaN        NaN                  NaN  12345.0    True        46.908          3.89        29.056
2021-04-15 22:00:00             2.0   311000.0               PROB2H  34903.0    True        39.000         67.00        50.000
2021-04-15 22:00:00             2.0   311000.0               PROB2H  34904.0    True        88.000         40.90        54.380

# Merge version
>>> dfL.merge(dfM, how='outer', left_index=True, right_index=True)
                     Datetime Index  Lot Group Lot No Booking Level      MID Passed?  Measurement1  Measurement2  Measurement3
2013-08-03 10:00:00             1.0   261291.0               PROB1H      NaN     NaN           NaN           NaN           NaN
2013-08-03 12:00:00             1.0   261228.0               PROB1H  78262.0    True        89.457          6.88        34.918
2013-08-28 10:00:00             NaN        NaN                  NaN  12345.0    True        46.908          3.89        29.056
2021-04-15 22:00:00             2.0   311000.0               PROB2H  34903.0    True        39.000         67.00        50.000
2021-04-15 22:00:00             2.0   311000.0               PROB2H  34904.0    True        88.000         40.90        54.380

您的 merge 尝试接近成功。

test2 = lots.merge(measurements, how='right', on='Datetime Index')

print(test2)

    Datetime Index  Lot Group    Lot No Booking Level    MID  Passed?  Measurement1  Measurement2  Measurement3
0 2013-08-28 10:00:00        NaN       NaN           NaN  12345     True        46.908          3.89        29.056
1 2013-08-03 12:00:00        1.0  261228.0        PROB1H  78262     True        89.457          6.88        34.918

如果您省略 on='Datetime Index',此示例仍然有效,但最好保留它以显示意图。来自 DataFrame.merge:

If on is None and not merging on indexes then [merge] defaults to the intersection of the columns in both DataFrames.