Python (Pandas) 如何按增量顺序合并具有不同日期的 2 个数据帧?

Python (Pandas) How to merge 2 dataframes with different dates in incremental order?

我正在尝试按日期索引按顺序合并 2 个数据帧。这可能吗?

我需要操作的示例代码

Link 对于 sg_df:https://query1.finance.yahoo.com/v7/finance/download/%5ESTI?P=^STI?period1=1442102400&period2=1599955200&interval=1mo&events=history

Link for facemask_compliance_df:https://today.yougov.com/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may(YouGov COVID-19 行为变化追踪器:在 public 地方戴口罩)

# Singapore Index
# Read file
# Format Date
# index date column for easy referencing
sg_df = pd.read_csv("^STI.csv")
conv = lambda x: datetime.strptime(x, "%d/%m/%Y")
sg_df["Date"] = sg_df["Date"].apply(conv)
sg_df.sort_values("Date", inplace = True)
sg_df.set_index("Date", inplace = True)

# Will wear face mask in public
# Read file
# Format Date, Removing time
# index date column for easy referencing
facemask_compliance_df = pd.read_csv("yougov-chart.csv")
convert1 = lambda x: datetime.strptime(x, "%d/%m/%Y %H:%M") 
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].apply(convert1).dt.date
facemask_compliance_df.sort_values("DateTime", inplace = True)
facemask_compliance_df.set_index("DateTime", inplace = True)

sg_df = sg_df.merge(facemask_compliance_df["Singapore"], left_index = True, right_index = True, how = "outer").sort_index()

我希望输出 table 类似这样的内容。

如果您需要更多信息,请告诉我,如果可以的话,我会尽快提供给您。

编辑:

这就是问题所在

来自 yougov-chart 的数据

即使它不是来自新加坡,我也认为它正在读取日期

如果我没记错的话在 numpy 中你可以做 v.stack 或 h.stack。取决于您想如何将它们连接在一起。

在 pandas 中有类似连接 https://pandas.pydata.org/docs/user_guide/merging.html 的东西,我用它来合并数据帧

使用:

  1. merge 合并到表格。

1.1。 on 选择要合并的列:

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

1.2。 outer 选项:

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

  1. sort_values 按日期排序
import pandas as pd

df1 = pd.read_csv("^STI.csv")

df1['Date'] = pd.to_datetime(df1.Date)

df2 = pd.read_csv("yougov-chart.csv")

df2['Date'] = pd.to_datetime(df2.DateTime)

result = df2.merge(df1, on='Date', how='outer')
result = result.sort_values('Date')

print(result)

输出:

        Date  US_GDP_Thousands  Mask Compliance
6 2016-02-01               NaN             37.0
7 2017-07-01               NaN             73.0
8 2019-10-01               NaN             85.0
0 2020-02-21              50.0             27.0
1 2020-03-18              55.0              NaN
2 2020-03-19              60.0              NaN
3 2020-03-25              65.0              NaN
4 2020-04-03              70.0              NaN
5 2020-05-14              75.0              NaN

首先在read_csv for DatetimeIndex in both and in second remove times by Series.dt.floor中使用参数parse_datesindex_col:

sg_df = pd.read_csv("^STI.csv", 
                    parse_dates=['Date'], 
                    index_col=['Date'])

facemask_compliance_df = pd.read_csv("yougov-chart.csv", 
                                     parse_dates=['DateTime'],
                                     index_col=['DateTime'])
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].dt.floor('d')

然后使用DataFrame.merge by index by outer join and then sort index by DataFrame.sort_index:

df = sg_df.merge(facemask_compliance_df, 
                 left_index=True, 
                 right_index=True, 
                 how='outer').sort_index()
print (df)
            Mask Compliance  US_GDP_Thousands
Date                                         
2016-02-01             37.0               NaN
2017-07-01             73.0               NaN
2019-10-01             85.0               NaN
2020-02-21             27.0              50.0
2020-03-18              NaN              55.0
2020-03-19              NaN              60.0
2020-03-25              NaN              65.0
2020-04-03              NaN              70.0
2020-05-14              NaN              75.0