如何合并 pandas 中的两个 dfs（基于日期时间段），如果重复则添加行

Question

我有以下2个dfs:

`diag`

id	encounter_key	start_of_period	end_of_period
1	AAA	2020-06-12	2021-07-07
1	BBB	2021-12-31	2022-01-04

`drug`

id	start_datetime	drug
1	2020-06-16	Mel
1	2020-06-18	Mel
1	2020-06-18	Flu
1	2022-01-01	Mel

我想合并 (?merge/?join/?concatenate) drug 的列，其中 start_datetime 在 diag 的开始和结束期间（含）内，最终在 diag 中有更多行，例如：

id	encounter_key	start_of_period	end_of_period	drug	start_datetime
1	AAA	2020-06-12	2021-07-07	Mel	2020-06-16
1	AAA	2020-06-12	2021-07-07	Mel	2020-06-18
1	AAA	2020-06-12	2021-07-07	Flu	2020-06-18
1	BBB	2021-12-31	2022-01-04	Mel	2022-01-01

希望这是有道理的，并为没有使用正确的术语而道歉 - 我不确定它们。提前致谢。

Answer 1

我将列复制了所需的次数，然后再次连接数据框。然后将两个数据帧连接在一起。也许有人会提供更好的解决方案。

out = diag[1:]
diag = pd.DataFrame(np.repeat(diag.values[:1], 3, axis=0), columns=diag.columns).astype(diag.dtypes)
diag = diag.append(out, ignore_index=True)
df = pd.concat([diag, drug], axis=1)
df = df.loc[:,~df.columns.duplicated()]
df = df.reindex(columns=['id', 'encounter_key', 'start_of_period', 'end_of_period', 'drug', 'start_datetime'])

输出

   id encounter_key start_of_period end_of_period drug start_datetime
0   1           AAA      2020-06-12    2021-07-07  Mel     2020-06-16
1   1           AAA      2020-06-12    2021-07-07  Mel     2020-06-18
2   1           AAA      2020-06-12    2021-07-07  Flu     2020-06-18
3   1           BBB      2021-12-31    2022-01-04  Mel     2022-01-01

如何合并 pandas 中的两个 dfs（基于日期时间段），如果重复则添加行

How to merge two dfs in pandas (based on datetime period), and add rows if duplicates

python

merge

dataframe

pandas

data-science

`diag`

`drug`