如何在不添加新列的情况下合并我的数据框以补偿丢失的数据?

How to merge my dataframe to compensate for missing data without adding new columns?

我正在尝试操纵 excel sheet 数据以在 excel(不是开发人员)上自动执行流程,我有 2 个数据帧:

一个看起来像下面(唯一的区别是更多的列)

Date                           Val1    Val2
    0   2020-09-29 13:22:57       5.34      3.2
    1   2020-09-29 13:23:12        4.5       Nan
    2   2020-09-29 13:23:44        Nan      56.4
    3   2020-09-29 13:24:01        24        0.3

我们注意到上面的索引是有序的,所有字段都填充了日期,但不一定填充所有其他列。

第二个数据帧具有以下特征,相等或更多的行没有任何额外的日期,也没有重复的日期但额外的行是空的(NaT for Date and Nan for all other columns),df2 的索引是由于其他进程也没有按顺序排列:

 Date                          Val1    Val2
0   2020-09-29 13:22:57         Nan      Nan
5   Nat                         Nan      Nan
1   2020-09-29 13:23:12         4.5       Nan
4    NaT                       Nan       Nan
6    Nat                       Nan       Nan
2   2020-09-29 13:23:44        Nan       Nan
3   2020-09-29 13:24:01        24        0.3

我基本上需要的是检查匹配日期,如果 df2 中的日期与 df1 中的日期匹配,则在不更改位置的情况下为 df2 中该日期的整行填充相同的精确值df2 中的空行或添加列:

预期输出:

 Date                             Val1         Val2
    0   2020-09-29 13:22:57       5.34        3.2
    5   Nat                         Nan      Nan
    1   2020-09-29 13:23:12         4.5       Nan
    4    NaT                       Nan       Nan
    6    Nat                       Nan       Nan
    2   2020-09-29 13:23:44        Nan       56.4
    3   2020-09-29 13:24:01        24        0.3

我尝试了多种方法,包括:

data_frames = [df,df_2]

df_merged = reduce(lambda left, right: pd.merge(left, right, on=['Date'],
                                                   how='outer'), data_frames)
print(df_merged)

还有:

 df_f = pd.merge(df, df_2, on='Date', how='outer').fillna(method='ffill')

也尝试将 how 更改为 innerleftright 等等,但没有得到我想要的结果,我只是得到组合列.

编辑:

df1 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', '2020-09-29 13:23:12', '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
                    'Val1': [5.34, 4.5, np.nan, 24],
                    'Val2': [3.2, np.nan, 56.4, 0.3]})

df2 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', np.nan,  '2020-09-29 13:23:12', np.nan, np.nan, '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
                    'Val1': [5.34, np.nan, 4.5, np.nan, np.nan, np.nan, 24],
                    'Val2': [3.2, np.nan, np.nan, np.nan, np.nan, 56.4, 0.3]},
                   index=[0,5,1,4,6,2,3])

f_f1 = df1.merge(df2["Date"], on="Date", how="right").set_index(df2.index)
print(f_f1)

IIUC,试试:

#convert to datetime if needed
df1["Date"] = pd.to_datetime(df1["Date"])
df2["Date"] = pd.to_datetime(df2["Date"])

f_f1 = df1.merge(df2["Date"], on="Date", how="right").set_index(df2.index)

>>> df_f
                  Date   Val1  Val2
0  2020-09-29 13:22:57   5.34   3.2
5                  NaN    NaN   NaN
1  2020-09-29 13:23:12   4.50   NaN
4                  NaN    NaN   NaN
6                  NaN    NaN   NaN
2  2020-09-29 13:23:44    NaN  56.4
3  2020-09-29 13:24:01  24.00   0.3
输入:
df1 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', '2020-09-29 13:23:12', '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
                    'Val1': [5.34, 4.5, np.nan, 24],
                    'Val2': [3.2, np.nan, 56.4, 0.3]})

df2 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', np.nan,  '2020-09-29 13:23:12', np.nan, np.nan, '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
                    'Val1': [5.34, np.nan, 4.5, np.nan, np.nan, np.nan, 24],
                    'Val2': [3.2, np.nan, np.nan, np.nan, np.nan, 56.4, 0.3]},
                   index=[0,5,1,4,6,2,3])