对所有客户重复的两个数据帧的外部连接 (python)

Outer join of two data frames that repeats for all customers (python)

如何为每个客户执行外部联接?

我有一个这样的数据集

 Customer      Timestamp        Other_Col
     A    2017-05-01 00:01:00     Jun
     A    2017-05-01 00:02:00     Sep
     A    2017-05-01 00:03:00     Jun
     B    2017-05-07 23:58:00     Sep
     B    2017-05-07 23:59:00     Sep

还有一个像这样的

         Timestamp
     2017-05-01 00:01:00
     2017-05-01 00:02:00
     2017-05-01 00:03:00
     2017-05-07 23:58:00
     2017-05-07 23:59:00

我想在我的数据框中获取每个客户的所有时间戳

 Customer      Timestamp        Other_Col
     A    2017-05-01 00:01:00     Jun
     A    2017-05-01 00:02:00     Sep
     A    2017-05-01 00:03:00     Jun
     A    2017-05-07 23:58:00     NaN
     A    2017-05-07 23:59:00     NaN
     B    2017-05-01 00:01:00     NaN
     B    2017-05-01 00:02:00     NaN
     B    2017-05-01 00:03:00     NaN
     B    2017-05-07 23:58:00     Sep
     B    2017-05-07 23:59:00     Sep

我该怎么做?进行合并(如何= 'outer')并不能解决问题,但我不能让它取决于客户。

您应该对 "base" table 进行左连接以实现此目的:

import pandas as pd
df1 = pd.read_csv('df1.txt',sep=';')
df1
Customer    Timestamp   Other_Col
0   A   2017-05-01 00:01:00 Jun
1   A   2017-05-01 00:02:00 Sep
2   A   2017-05-01 00:03:00 Jun
3   B   2017-05-07 23:58:00 Sep
4   B   2017-05-07 23:59:00 Sep

df2 = pd.read_csv('df2.txt',sep=';')
df2
Timestamp
0   2017-05-01 00:01:00
1   2017-05-01 00:02:00
2   2017-05-01 00:03:00
3   2017-05-07 23:58:00
4   2017-05-07 23:59:00


base = pd.DataFrame()
base['Customer']  = ['A']*5 + ['B']*5 
base['Timestamp'] = list(df2['Timestamp'])*2


pd.merge(base,df1,how='left',on=['Customer','Timestamp'])
Customer    Timestamp   Other_Col
0   A   2017-05-01 00:01:00 Jun
1   A   2017-05-01 00:02:00 Sep
2   A   2017-05-01 00:03:00 Jun
3   A   2017-05-07 23:58:00 NaN
4   A   2017-05-07 23:59:00 NaN
5   B   2017-05-01 00:01:00 NaN
6   B   2017-05-01 00:02:00 NaN
7   B   2017-05-01 00:03:00 NaN
8   B   2017-05-07 23:58:00 Sep
9   B   2017-05-07 23:59:00 Sep