在某些列上交错 2 个数据帧
Interleave 2 Dataframes on certain columns
我有 2 个数据帧
df1:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
df2:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4
我想交错这两个数据帧,我已经尝试了几个选项,但似乎无法让它工作。我接近以下代码的功能,但它没有加入适当的列
import pandas as pd
from toolz import interleave
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)
预期输出看起来像
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan
我认为它应该很简单,但我找不到合适的语法。任何人都可以提供任何想法吗?
在此先感谢您的帮助!
如果列名相同,唯一的区别是可以使用其中一个 DataFrame 中的一些新列名:
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 None
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 None
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 None
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN
适用于任何列名称的更通用的解决方案是在之前使用 DataFrame.align
以防止每个 DataFrame 正确对齐列:
print (df1)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000 Switch1 DD1
1 Switch3 AD2 P2 MTP 1000 Switch2 DD2
2 Switch4 AD3 P6 MTP 1000 Switch3 DD3
StartPort
0 P1
1 P3
2 P5
print (df2)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 RU25 AJ11 P2 NaN NaN RU15 AB11
1 RU35 AB11 P2 NaN NaN RU18 AB12
2 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P2
2 P3
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000.0 Switch1 DD1
1 RU25 AJ11 P2 NaN NaN RU15 AB11
2 Switch3 AD2 P2 MTP 1000.0 Switch2 DD2
3 RU35 AB11 P2 NaN NaN RU18 AB12
4 Switch4 AD3 P6 MTP 1000.0 Switch3 DD3
5 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P1
2 P3
3 P2
4 P5
5 P3
Index.union
and DataFrame.reindex
的另一个想法:
cols = df1.columns.union(df2.columns, sort=False)
df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 DD2 Switch2 P3 AD2 Switch3 P2 MTP
2 DD3 Switch3 P5 AD3 Switch4 P6 MTP
Speed
0 1000
1 1000
2 1000
print (df2)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 AB11 RU15 P1 AJ11 RU25 P2 NaN
1 AB12 RU18 P2 AB11 RU35 P2 NaN
2 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 NaN
1 NaN
2 NaN
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 NaN
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 NaN
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN
我有 2 个数据帧
df1:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
df2:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4
我想交错这两个数据帧,我已经尝试了几个选项,但似乎无法让它工作。我接近以下代码的功能,但它没有加入适当的列
import pandas as pd
from toolz import interleave
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)
预期输出看起来像
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan
我认为它应该很简单,但我找不到合适的语法。任何人都可以提供任何想法吗?
在此先感谢您的帮助!
如果列名相同,唯一的区别是可以使用其中一个 DataFrame 中的一些新列名:
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 None
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 None
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 None
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN
适用于任何列名称的更通用的解决方案是在之前使用 DataFrame.align
以防止每个 DataFrame 正确对齐列:
print (df1)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000 Switch1 DD1
1 Switch3 AD2 P2 MTP 1000 Switch2 DD2
2 Switch4 AD3 P6 MTP 1000 Switch3 DD3
StartPort
0 P1
1 P3
2 P5
print (df2)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 RU25 AJ11 P2 NaN NaN RU15 AB11
1 RU35 AB11 P2 NaN NaN RU18 AB12
2 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P2
2 P3
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000.0 Switch1 DD1
1 RU25 AJ11 P2 NaN NaN RU15 AB11
2 Switch3 AD2 P2 MTP 1000.0 Switch2 DD2
3 RU35 AB11 P2 NaN NaN RU18 AB12
4 Switch4 AD3 P6 MTP 1000.0 Switch3 DD3
5 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P1
2 P3
3 P2
4 P5
5 P3
Index.union
and DataFrame.reindex
的另一个想法:
cols = df1.columns.union(df2.columns, sort=False)
df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 DD2 Switch2 P3 AD2 Switch3 P2 MTP
2 DD3 Switch3 P5 AD3 Switch4 P6 MTP
Speed
0 1000
1 1000
2 1000
print (df2)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 AB11 RU15 P1 AJ11 RU25 P2 NaN
1 AB12 RU18 P2 AB11 RU35 P2 NaN
2 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 NaN
1 NaN
2 NaN
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 NaN
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 NaN
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN