使用 pandas 数据帧映射到来自运输(节点)点的运输容量请求
Map to a Trans capacity request from transit(Nodal) points using pandas dataframes
输入带有 TerminalID、TName、XY 坐标、PeopleID 的数据框
import pandas as pd
data = {
'TerminalID': ['5','21','21','2','21','2','5','22','22','22','2','32','41','41','42','50','50'],
'TName': ['AD','AMBO','AMBO','PS','AMBO','PS','AD','AM','AM','AM','PS','BO','BA','BA','BB','AZ','AZ'],
'xy': ['1.12731,1.153756','0.12731,0.153757','0.12731,0.153757','1.989385,1.201941','0.12731,0.153757','1.989385,1.201941','1.12731,1.153756','2.12731,1.153756','2.12731,1.153756','2.12731,1.153756','1.989385,1.201941','1.989385,1.201941','2.989385,1.201941','2.989385,1.201941','2.989385,3.201941','3.989385,3.201941','3.989385,3.201941'],
'Pcode': [ 'None','Z014','Z015','Z016','Z017','Z018','None','Z020','Z021','Z022','Z023','Z024','Z025','Z026','Z027','Z028','Z029']
}
df = pd.DataFrame.from_dict(data)
输出[55]:
DF1的输出
TerminalID TName xy Pcode
0 5 AD 1.12731,1.153756 None
1 21 AMBO 0.12731,0.153757 Z014
2 21 AMBO 0.12731,0.153757 Z015
3 2 PS 1.989385,1.201941 Z016
4 21 AMBO 0.12731,0.153757 Z017
5 2 PS 1.989385,1.201941 Z018
6 5 AD 1.12731,1.153756 None
7 22 AM 2.12731,1.153756 Z020
8 22 AM 2.12731,1.153756 Z021
9 22 AM 2.12731,1.153756 Z022
10 2 PS 1.989385,1.201941 Z023
11 32 BO 1.989385,1.201941 Z024
12 41 BA 2.989385,1.201941 Z025
13 41 BA 2.989385,1.201941 Z026
14 42 BB 2.989385,3.201941 Z027
15 50 AZ 3.989385,3.201941 Z028
16 50 AZ 3.989385,3.201941 Z029
DF2,
T_cap 是终端 ID 的容量要求,T_load 是负载详细信息,Tcap 是 运行 计数增量,T_load 是实际请求在航站楼,
开头和结尾的0是解决方案的填充
data2= {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','2','2','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD']
}
df2 = pd.DataFrame.from_dict(data2)
输出[59]:
BusID Tcap T_Load TerminalID TName
0 18 0 0 5 AD
1 18 2 2 21 AMBO
2 18 3 1 33 AM
3 18 6 2 2 PS
4 18 7 2 32 BO
5 18 8 1 42 BB
6 18 10 2 41 BA
7 18 12 2 50 AZ
8 18 12 0 5 AD
数据帧#请求的最终输出
输出基于 T_Load 约束。
data3 = {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','3','1','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD'],
'Pcode':['None','Z013,Z019','Z020','Z016,Z018,Z023','Z024','Z027','Z025,Z026','Z028,Z029','None']
}
df3 = pd.DataFrame.from_dict(data3)
输出[61]:
BusID Tcap T_Load TerminalID TName Pcode
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z013,Z019
2 18 3 1 33 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
谢谢你
我的解决方案按 TerminalID
和 TName
聚合连接并通过聚合列表分配给另一个 DataFrame,最后按列表理解中的位置过滤值 join
:
s = df.groupby(['TerminalID','TName'])['Pcode'].agg(list).rename('P_list')
df = df2.join(s, on=['TerminalID','TName'])
df['P_list'] = [','.join(x[:int(y)]) if int(y) != 0 else None
for x, y in zip(df['P_list'], df['T_Load'])]
print (df)
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
您可以 map
每个 TName 的聚合字符串:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode'].agg(','.join))
或者,如果您想将多个字符串 None 替换为单个字符串:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode']
.agg(lambda x: ','.join(e for e in x if e != 'None'))
.replace('', 'None')
)
输出:
BusID Tcap T_Load TerminalID TName Plist
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015,Z017
2 18 3 1 22 AM Z020,Z021,Z022
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
更新:限制输出:
然后您可以 trim 带有正则表达式的列,我们可以使用 groupby
从每个组内的矢量化字符串操作中获益(如果组很少但行很多,这最有趣):
df2['P_list'] = (df2.groupby('T_Load')['P_list']
.apply(lambda c: c.str.extract(rf'((?:[^,]+,?){{,{str(c.name)}}})',
expand=False)
.str.strip(',')
)
.replace('', 'None')
)
输出:
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
输入带有 TerminalID、TName、XY 坐标、PeopleID 的数据框
import pandas as pd
data = {
'TerminalID': ['5','21','21','2','21','2','5','22','22','22','2','32','41','41','42','50','50'],
'TName': ['AD','AMBO','AMBO','PS','AMBO','PS','AD','AM','AM','AM','PS','BO','BA','BA','BB','AZ','AZ'],
'xy': ['1.12731,1.153756','0.12731,0.153757','0.12731,0.153757','1.989385,1.201941','0.12731,0.153757','1.989385,1.201941','1.12731,1.153756','2.12731,1.153756','2.12731,1.153756','2.12731,1.153756','1.989385,1.201941','1.989385,1.201941','2.989385,1.201941','2.989385,1.201941','2.989385,3.201941','3.989385,3.201941','3.989385,3.201941'],
'Pcode': [ 'None','Z014','Z015','Z016','Z017','Z018','None','Z020','Z021','Z022','Z023','Z024','Z025','Z026','Z027','Z028','Z029']
}
df = pd.DataFrame.from_dict(data)
输出[55]:
DF1的输出
TerminalID TName xy Pcode
0 5 AD 1.12731,1.153756 None
1 21 AMBO 0.12731,0.153757 Z014
2 21 AMBO 0.12731,0.153757 Z015
3 2 PS 1.989385,1.201941 Z016
4 21 AMBO 0.12731,0.153757 Z017
5 2 PS 1.989385,1.201941 Z018
6 5 AD 1.12731,1.153756 None
7 22 AM 2.12731,1.153756 Z020
8 22 AM 2.12731,1.153756 Z021
9 22 AM 2.12731,1.153756 Z022
10 2 PS 1.989385,1.201941 Z023
11 32 BO 1.989385,1.201941 Z024
12 41 BA 2.989385,1.201941 Z025
13 41 BA 2.989385,1.201941 Z026
14 42 BB 2.989385,3.201941 Z027
15 50 AZ 3.989385,3.201941 Z028
16 50 AZ 3.989385,3.201941 Z029
DF2,
T_cap 是终端 ID 的容量要求,T_load 是负载详细信息,Tcap 是 运行 计数增量,T_load 是实际请求在航站楼, 开头和结尾的0是解决方案的填充
data2= {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','2','2','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD']
}
df2 = pd.DataFrame.from_dict(data2)
输出[59]:
BusID Tcap T_Load TerminalID TName
0 18 0 0 5 AD
1 18 2 2 21 AMBO
2 18 3 1 33 AM
3 18 6 2 2 PS
4 18 7 2 32 BO
5 18 8 1 42 BB
6 18 10 2 41 BA
7 18 12 2 50 AZ
8 18 12 0 5 AD
数据帧#请求的最终输出
输出基于 T_Load 约束。
data3 = {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','3','1','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD'],
'Pcode':['None','Z013,Z019','Z020','Z016,Z018,Z023','Z024','Z027','Z025,Z026','Z028,Z029','None']
}
df3 = pd.DataFrame.from_dict(data3)
输出[61]:
BusID Tcap T_Load TerminalID TName Pcode
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z013,Z019
2 18 3 1 33 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
谢谢你
我的解决方案按 TerminalID
和 TName
聚合连接并通过聚合列表分配给另一个 DataFrame,最后按列表理解中的位置过滤值 join
:
s = df.groupby(['TerminalID','TName'])['Pcode'].agg(list).rename('P_list')
df = df2.join(s, on=['TerminalID','TName'])
df['P_list'] = [','.join(x[:int(y)]) if int(y) != 0 else None
for x, y in zip(df['P_list'], df['T_Load'])]
print (df)
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
您可以 map
每个 TName 的聚合字符串:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode'].agg(','.join))
或者,如果您想将多个字符串 None 替换为单个字符串:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode']
.agg(lambda x: ','.join(e for e in x if e != 'None'))
.replace('', 'None')
)
输出:
BusID Tcap T_Load TerminalID TName Plist
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015,Z017
2 18 3 1 22 AM Z020,Z021,Z022
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
更新:限制输出:
然后您可以 trim 带有正则表达式的列,我们可以使用 groupby
从每个组内的矢量化字符串操作中获益(如果组很少但行很多,这最有趣):
df2['P_list'] = (df2.groupby('T_Load')['P_list']
.apply(lambda c: c.str.extract(rf'((?:[^,]+,?){{,{str(c.name)}}})',
expand=False)
.str.strip(',')
)
.replace('', 'None')
)
输出:
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None