Python (Pandas) 如何按增量顺序合并具有不同日期的 2 个数据帧?
Python (Pandas) How to merge 2 dataframes with different dates in incremental order?
我正在尝试按日期索引按顺序合并 2 个数据帧。这可能吗?
我需要操作的示例代码
Link for facemask_compliance_df:https://today.yougov.com/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may(YouGov COVID-19 行为变化追踪器:在 public 地方戴口罩)
# Singapore Index
# Read file
# Format Date
# index date column for easy referencing
sg_df = pd.read_csv("^STI.csv")
conv = lambda x: datetime.strptime(x, "%d/%m/%Y")
sg_df["Date"] = sg_df["Date"].apply(conv)
sg_df.sort_values("Date", inplace = True)
sg_df.set_index("Date", inplace = True)
# Will wear face mask in public
# Read file
# Format Date, Removing time
# index date column for easy referencing
facemask_compliance_df = pd.read_csv("yougov-chart.csv")
convert1 = lambda x: datetime.strptime(x, "%d/%m/%Y %H:%M")
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].apply(convert1).dt.date
facemask_compliance_df.sort_values("DateTime", inplace = True)
facemask_compliance_df.set_index("DateTime", inplace = True)
sg_df = sg_df.merge(facemask_compliance_df["Singapore"], left_index = True, right_index = True, how = "outer").sort_index()
我希望输出 table 类似这样的内容。
如果您需要更多信息,请告诉我,如果可以的话,我会尽快提供给您。
编辑:
这就是问题所在
来自 yougov-chart 的数据
即使它不是来自新加坡,我也认为它正在读取日期
如果我没记错的话在 numpy 中你可以做 v.stack 或 h.stack。取决于您想如何将它们连接在一起。
在 pandas 中有类似连接 https://pandas.pydata.org/docs/user_guide/merging.html 的东西,我用它来合并数据帧
使用:
merge
合并到表格。
1.1。 on
选择要合并的列:
Column or index level names to join on. These must be found in both DataFrames. If on
is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
1.2。 outer
选项:
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
sort_values
按日期排序
import pandas as pd
df1 = pd.read_csv("^STI.csv")
df1['Date'] = pd.to_datetime(df1.Date)
df2 = pd.read_csv("yougov-chart.csv")
df2['Date'] = pd.to_datetime(df2.DateTime)
result = df2.merge(df1, on='Date', how='outer')
result = result.sort_values('Date')
print(result)
输出:
Date US_GDP_Thousands Mask Compliance
6 2016-02-01 NaN 37.0
7 2017-07-01 NaN 73.0
8 2019-10-01 NaN 85.0
0 2020-02-21 50.0 27.0
1 2020-03-18 55.0 NaN
2 2020-03-19 60.0 NaN
3 2020-03-25 65.0 NaN
4 2020-04-03 70.0 NaN
5 2020-05-14 75.0 NaN
首先在read_csv
for DatetimeIndex in both and in second remove times by Series.dt.floor
中使用参数parse_dates
和index_col
:
sg_df = pd.read_csv("^STI.csv",
parse_dates=['Date'],
index_col=['Date'])
facemask_compliance_df = pd.read_csv("yougov-chart.csv",
parse_dates=['DateTime'],
index_col=['DateTime'])
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].dt.floor('d')
然后使用DataFrame.merge
by index by outer join and then sort index by DataFrame.sort_index
:
df = sg_df.merge(facemask_compliance_df,
left_index=True,
right_index=True,
how='outer').sort_index()
print (df)
Mask Compliance US_GDP_Thousands
Date
2016-02-01 37.0 NaN
2017-07-01 73.0 NaN
2019-10-01 85.0 NaN
2020-02-21 27.0 50.0
2020-03-18 NaN 55.0
2020-03-19 NaN 60.0
2020-03-25 NaN 65.0
2020-04-03 NaN 70.0
2020-05-14 NaN 75.0
我正在尝试按日期索引按顺序合并 2 个数据帧。这可能吗?
我需要操作的示例代码
Link for facemask_compliance_df:https://today.yougov.com/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may(YouGov COVID-19 行为变化追踪器:在 public 地方戴口罩)
# Singapore Index
# Read file
# Format Date
# index date column for easy referencing
sg_df = pd.read_csv("^STI.csv")
conv = lambda x: datetime.strptime(x, "%d/%m/%Y")
sg_df["Date"] = sg_df["Date"].apply(conv)
sg_df.sort_values("Date", inplace = True)
sg_df.set_index("Date", inplace = True)
# Will wear face mask in public
# Read file
# Format Date, Removing time
# index date column for easy referencing
facemask_compliance_df = pd.read_csv("yougov-chart.csv")
convert1 = lambda x: datetime.strptime(x, "%d/%m/%Y %H:%M")
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].apply(convert1).dt.date
facemask_compliance_df.sort_values("DateTime", inplace = True)
facemask_compliance_df.set_index("DateTime", inplace = True)
sg_df = sg_df.merge(facemask_compliance_df["Singapore"], left_index = True, right_index = True, how = "outer").sort_index()
我希望输出 table 类似这样的内容。
如果您需要更多信息,请告诉我,如果可以的话,我会尽快提供给您。
编辑:
这就是问题所在
来自 yougov-chart 的数据
即使它不是来自新加坡,我也认为它正在读取日期
如果我没记错的话在 numpy 中你可以做 v.stack 或 h.stack。取决于您想如何将它们连接在一起。
在 pandas 中有类似连接 https://pandas.pydata.org/docs/user_guide/merging.html 的东西,我用它来合并数据帧
使用:
merge
合并到表格。
1.1。 on
选择要合并的列:
Column or index level names to join on. These must be found in both DataFrames. If
on
is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
1.2。 outer
选项:
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
sort_values
按日期排序
import pandas as pd
df1 = pd.read_csv("^STI.csv")
df1['Date'] = pd.to_datetime(df1.Date)
df2 = pd.read_csv("yougov-chart.csv")
df2['Date'] = pd.to_datetime(df2.DateTime)
result = df2.merge(df1, on='Date', how='outer')
result = result.sort_values('Date')
print(result)
输出:
Date US_GDP_Thousands Mask Compliance
6 2016-02-01 NaN 37.0
7 2017-07-01 NaN 73.0
8 2019-10-01 NaN 85.0
0 2020-02-21 50.0 27.0
1 2020-03-18 55.0 NaN
2 2020-03-19 60.0 NaN
3 2020-03-25 65.0 NaN
4 2020-04-03 70.0 NaN
5 2020-05-14 75.0 NaN
首先在read_csv
for DatetimeIndex in both and in second remove times by Series.dt.floor
中使用参数parse_dates
和index_col
:
sg_df = pd.read_csv("^STI.csv",
parse_dates=['Date'],
index_col=['Date'])
facemask_compliance_df = pd.read_csv("yougov-chart.csv",
parse_dates=['DateTime'],
index_col=['DateTime'])
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].dt.floor('d')
然后使用DataFrame.merge
by index by outer join and then sort index by DataFrame.sort_index
:
df = sg_df.merge(facemask_compliance_df,
left_index=True,
right_index=True,
how='outer').sort_index()
print (df)
Mask Compliance US_GDP_Thousands
Date
2016-02-01 37.0 NaN
2017-07-01 73.0 NaN
2019-10-01 85.0 NaN
2020-02-21 27.0 50.0
2020-03-18 NaN 55.0
2020-03-19 NaN 60.0
2020-03-25 NaN 65.0
2020-04-03 NaN 70.0
2020-05-14 NaN 75.0