在不在 DataFrame 中创建辅助列的情况下按日期字段执行合并
Perform a merge by date field without creating an auxiliary column in the DataFrame
在python pandas中是以下DataFrames:
| date | counter |
|-----------------------------|------------------|
| 2022-01-01 10:00:02+00:00 | 34 |
| 2022-01-03 11:03:02+00:00 | 23 |
| 2022-02-01 12:00:05+00:00 | 12 |
| 2022-03-01 21:04:02+00:00 | 7 |
| date | holiday |
|-----------------------------|------------------|
| 2022-01-01 | True |
| 2022-01-02 | False |
| 2022-01-03 | True |
| 2022-02-01 | True |
| 2022-02-02 | True |
| 2022-02-03 | True |
| 2022-03-01 | False |
| 2022-03-02 | True |
| 2022-03-03 | False |
考虑到我不想创建带有日期的辅助列,我该如何合并两个 DataFrame?
| date | counter | holiday |
|-----------------------------|------------------|--------------|
| 2022-01-01 10:00:02+00:00 | 34 | True |
| 2022-01-03 11:03:02+00:00 | 23 | True |
| 2022-02-01 12:00:05+00:00 | 12 | True |
| 2022-03-01 21:04:02+00:00 | 7 | False |
提前感谢您的帮助。
使用 Series.map
with datetimes without times by Series.dt.normalize
- 然后在 df2
输出中不创建辅助列:
df2['holiday'] = df2['date'].dt.normalize().map(df1.set_index('date')['holiday'])
merge_asof
, but for avoid error need remove timezones by Series.dt.tz_convert
的另一个想法:
df = pd.merge_asof(df1.assign(date = df1['date'].dt.tz_convert(None)).sort_values('date'),
df2, on='date')
print (df)
date counter holiday
0 2022-01-01 10:00:02 34 True
1 2022-01-03 11:03:02 23 True
2 2022-02-01 12:00:05 12 True
3 2022-03-01 21:04:02 7 False
在python pandas中是以下DataFrames:
| date | counter |
|-----------------------------|------------------|
| 2022-01-01 10:00:02+00:00 | 34 |
| 2022-01-03 11:03:02+00:00 | 23 |
| 2022-02-01 12:00:05+00:00 | 12 |
| 2022-03-01 21:04:02+00:00 | 7 |
| date | holiday |
|-----------------------------|------------------|
| 2022-01-01 | True |
| 2022-01-02 | False |
| 2022-01-03 | True |
| 2022-02-01 | True |
| 2022-02-02 | True |
| 2022-02-03 | True |
| 2022-03-01 | False |
| 2022-03-02 | True |
| 2022-03-03 | False |
考虑到我不想创建带有日期的辅助列,我该如何合并两个 DataFrame?
| date | counter | holiday |
|-----------------------------|------------------|--------------|
| 2022-01-01 10:00:02+00:00 | 34 | True |
| 2022-01-03 11:03:02+00:00 | 23 | True |
| 2022-02-01 12:00:05+00:00 | 12 | True |
| 2022-03-01 21:04:02+00:00 | 7 | False |
提前感谢您的帮助。
使用 Series.map
with datetimes without times by Series.dt.normalize
- 然后在 df2
输出中不创建辅助列:
df2['holiday'] = df2['date'].dt.normalize().map(df1.set_index('date')['holiday'])
merge_asof
, but for avoid error need remove timezones by Series.dt.tz_convert
的另一个想法:
df = pd.merge_asof(df1.assign(date = df1['date'].dt.tz_convert(None)).sort_values('date'),
df2, on='date')
print (df)
date counter holiday
0 2022-01-01 10:00:02 34 True
1 2022-01-03 11:03:02 23 True
2 2022-02-01 12:00:05 12 True
3 2022-03-01 21:04:02 7 False