如何合并 2 Python 具有不同列但可能具有相同日期时间索引的数据帧，避免索引重复？

Question

我有 2 个具有不同列名并按日期时间索引的数据框：

df1

            A  B
time            
2011-03-01  1  6
2011-03-02  4  8
2011-03-08  5  2
2011-03-09  6  3

df2

            C  D
time            
2011-03-01  8  7
2011-03-02  9  6
2011-03-07  4  4
2011-03-08  1  2

我想合并它们得到这样的东西：

            A  B  C  D
time                          
2011-03-01  1  6  8  7
2011-03-02  4  8  9  3
2011-03-07  NaN  NaN  4  4
2011-03-08  5  2  1  2
2011-03-09  6  3  NaN  NaN

而是使用 concat 命令：

df = pd.concat([df1, df2], axis=0).sort_index()

我得到以下合并数据框：

              A    B    C    D
time                          
2011-03-01  1.0  6.0  NaN  NaN
2011-03-01  NaN  NaN  8.0  7.0
2011-03-02  4.0  8.0  NaN  NaN
2011-03-02  NaN  NaN  9.0  6.0
2011-03-07  NaN  NaN  4.0  4.0
2011-03-08  5.0  2.0  NaN  NaN
2011-03-08  NaN  NaN  1.0  2.0
2011-03-09  6.0  3.0  NaN  NaN

有不需要的重复索引！

如何正确合并 2 个 DF？

Answer 1

我没有访问你的数据框来检查我的代码，但这里是我想出的：

import numpy as np
df2.reset_index().merge(df1.reset_index(), on="time").fillna(np.NAN)

说明

您需要调用 reset_index() 将索引更改为列。如果您不感兴趣，可以在 merge 函数中使用 left_index 和 right_index 参数。

Answer 2

你可以使用 pd.merge

pd.merge(
    df1, # Your first df 
    df2, # Second df
    how="outer", 
    left_index=True, # merging on index (your datetime)
    right_index=True, # mergin on index (your datetime)
)

Answer 3

pandas.concat 是要走的路，使用 axis=1.

如果您仍然对 axis=1 有问题，那么这意味着您的索引不对齐（可能是不同的类型）并且您对 join 或 [=16= 也会有同样的问题].

df1 = pd.DataFrame({'A': [1,4,5,6], 'B': [6,8,2,3]},
                   index=['2011-03-01', '2011-03-02', '2011-03-08', '2011-03-09'])

df2 = pd.DataFrame({'C': [8,9,1,4], 'D': [7,6,2,4]},
                   index=['2011-03-01', '2011-03-02', '2011-03-07', '2011-03-08'])

pd.concat([df1, df2], axis=1).sort_index()

输出：

              A    B    C    D
2011-03-01  1.0  6.0  8.0  7.0
2011-03-02  4.0  8.0  9.0  6.0
2011-03-07  NaN  NaN  1.0  2.0
2011-03-08  5.0  2.0  4.0  4.0
2011-03-09  6.0  3.0  NaN  NaN

如何合并 2 Python 具有不同列但可能具有相同日期时间索引的数据帧，避免索引重复？

How to merge 2 Python dataframes with different columns but possible same datetime index avoiding index duplicates?

python

merge

concatenation

dataframe

pandas

说明