对所有客户重复的两个数据帧的外部连接 (python)
Outer join of two data frames that repeats for all customers (python)
如何为每个客户执行外部联接?
我有一个这样的数据集
Customer Timestamp Other_Col
A 2017-05-01 00:01:00 Jun
A 2017-05-01 00:02:00 Sep
A 2017-05-01 00:03:00 Jun
B 2017-05-07 23:58:00 Sep
B 2017-05-07 23:59:00 Sep
还有一个像这样的
Timestamp
2017-05-01 00:01:00
2017-05-01 00:02:00
2017-05-01 00:03:00
2017-05-07 23:58:00
2017-05-07 23:59:00
我想在我的数据框中获取每个客户的所有时间戳
Customer Timestamp Other_Col
A 2017-05-01 00:01:00 Jun
A 2017-05-01 00:02:00 Sep
A 2017-05-01 00:03:00 Jun
A 2017-05-07 23:58:00 NaN
A 2017-05-07 23:59:00 NaN
B 2017-05-01 00:01:00 NaN
B 2017-05-01 00:02:00 NaN
B 2017-05-01 00:03:00 NaN
B 2017-05-07 23:58:00 Sep
B 2017-05-07 23:59:00 Sep
我该怎么做?进行合并(如何= 'outer')并不能解决问题,但我不能让它取决于客户。
您应该对 "base" table 进行左连接以实现此目的:
import pandas as pd
df1 = pd.read_csv('df1.txt',sep=';')
df1
Customer Timestamp Other_Col
0 A 2017-05-01 00:01:00 Jun
1 A 2017-05-01 00:02:00 Sep
2 A 2017-05-01 00:03:00 Jun
3 B 2017-05-07 23:58:00 Sep
4 B 2017-05-07 23:59:00 Sep
df2 = pd.read_csv('df2.txt',sep=';')
df2
Timestamp
0 2017-05-01 00:01:00
1 2017-05-01 00:02:00
2 2017-05-01 00:03:00
3 2017-05-07 23:58:00
4 2017-05-07 23:59:00
base = pd.DataFrame()
base['Customer'] = ['A']*5 + ['B']*5
base['Timestamp'] = list(df2['Timestamp'])*2
pd.merge(base,df1,how='left',on=['Customer','Timestamp'])
Customer Timestamp Other_Col
0 A 2017-05-01 00:01:00 Jun
1 A 2017-05-01 00:02:00 Sep
2 A 2017-05-01 00:03:00 Jun
3 A 2017-05-07 23:58:00 NaN
4 A 2017-05-07 23:59:00 NaN
5 B 2017-05-01 00:01:00 NaN
6 B 2017-05-01 00:02:00 NaN
7 B 2017-05-01 00:03:00 NaN
8 B 2017-05-07 23:58:00 Sep
9 B 2017-05-07 23:59:00 Sep
如何为每个客户执行外部联接?
我有一个这样的数据集
Customer Timestamp Other_Col
A 2017-05-01 00:01:00 Jun
A 2017-05-01 00:02:00 Sep
A 2017-05-01 00:03:00 Jun
B 2017-05-07 23:58:00 Sep
B 2017-05-07 23:59:00 Sep
还有一个像这样的
Timestamp
2017-05-01 00:01:00
2017-05-01 00:02:00
2017-05-01 00:03:00
2017-05-07 23:58:00
2017-05-07 23:59:00
我想在我的数据框中获取每个客户的所有时间戳
Customer Timestamp Other_Col
A 2017-05-01 00:01:00 Jun
A 2017-05-01 00:02:00 Sep
A 2017-05-01 00:03:00 Jun
A 2017-05-07 23:58:00 NaN
A 2017-05-07 23:59:00 NaN
B 2017-05-01 00:01:00 NaN
B 2017-05-01 00:02:00 NaN
B 2017-05-01 00:03:00 NaN
B 2017-05-07 23:58:00 Sep
B 2017-05-07 23:59:00 Sep
我该怎么做?进行合并(如何= 'outer')并不能解决问题,但我不能让它取决于客户。
您应该对 "base" table 进行左连接以实现此目的:
import pandas as pd
df1 = pd.read_csv('df1.txt',sep=';')
df1
Customer Timestamp Other_Col
0 A 2017-05-01 00:01:00 Jun
1 A 2017-05-01 00:02:00 Sep
2 A 2017-05-01 00:03:00 Jun
3 B 2017-05-07 23:58:00 Sep
4 B 2017-05-07 23:59:00 Sep
df2 = pd.read_csv('df2.txt',sep=';')
df2
Timestamp
0 2017-05-01 00:01:00
1 2017-05-01 00:02:00
2 2017-05-01 00:03:00
3 2017-05-07 23:58:00
4 2017-05-07 23:59:00
base = pd.DataFrame()
base['Customer'] = ['A']*5 + ['B']*5
base['Timestamp'] = list(df2['Timestamp'])*2
pd.merge(base,df1,how='left',on=['Customer','Timestamp'])
Customer Timestamp Other_Col
0 A 2017-05-01 00:01:00 Jun
1 A 2017-05-01 00:02:00 Sep
2 A 2017-05-01 00:03:00 Jun
3 A 2017-05-07 23:58:00 NaN
4 A 2017-05-07 23:59:00 NaN
5 B 2017-05-01 00:01:00 NaN
6 B 2017-05-01 00:02:00 NaN
7 B 2017-05-01 00:03:00 NaN
8 B 2017-05-07 23:58:00 Sep
9 B 2017-05-07 23:59:00 Sep