在某些列上交错 2 个数据帧

Interleave 2 Dataframes on certain columns

我有 2 个数据帧

df1:

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000

df2:

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4

我想交错这两个数据帧,我已经尝试了几个选项,但似乎无法让它工作。我接近以下代码的功能,但它没有加入适当的列

import pandas as pd
from toolz import interleave

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)

预期输出看起来像

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan

我认为它应该很简单,但我找不到合适的语法。任何人都可以提供任何想法吗?

在此先感谢您的帮助!

如果列名相同,唯一的区别是可以使用其中一个 DataFrame 中的一些新列名:

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2     None   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2     None   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4     None   

    Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN  

适用于任何列名称的更通用的解决方案是在之前使用 DataFrame.align 以防止每个 DataFrame 正确对齐列:

print (df1)
  EndDevice EndLocation EndPort LinkType  Speed StartDevice StartLocation  \
0   Switch2         AD1      P2      MTP   1000     Switch1           DD1   
1   Switch3         AD2      P2      MTP   1000     Switch2           DD2   
2   Switch4         AD3      P6      MTP   1000     Switch3           DD3   

  StartPort  
0        P1  
1        P3  
2        P5  

print (df2)
  EndDevice EndLocation EndPort  LinkType  Speed StartDevice StartLocation  \
0      RU25        AJ11      P2       NaN    NaN        RU15          AB11   
1      RU35        AB11      P2       NaN    NaN        RU18          AB12   
2      RU40        AB11      P4       NaN    NaN        RU19          AB13   

  StartPort  
0        P1  
1        P2  
2        P3  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
  EndDevice EndLocation EndPort LinkType   Speed StartDevice StartLocation  \
0   Switch2         AD1      P2      MTP  1000.0     Switch1           DD1   
1      RU25        AJ11      P2      NaN     NaN        RU15          AB11   
2   Switch3         AD2      P2      MTP  1000.0     Switch2           DD2   
3      RU35        AB11      P2      NaN     NaN        RU18          AB12   
4   Switch4         AD3      P6      MTP  1000.0     Switch3           DD3   
5      RU40        AB11      P4      NaN     NaN        RU19          AB13   

  StartPort  
0        P1  
1        P1  
2        P3  
3        P2  
4        P5  
5        P3  

Index.union and DataFrame.reindex的另一个想法:

cols = df1.columns.union(df2.columns, sort=False)

df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
2           DD3     Switch3        P5         AD3   Switch4      P6      MTP   

   Speed  
0   1000  
1   1000  
2   1000  

print (df2)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort  LinkType  \
0          AB11        RU15        P1        AJ11      RU25      P2       NaN   
1          AB12        RU18        P2        AB11      RU35      P2       NaN   
2          AB13        RU19        P3        AB11      RU40      P4       NaN   

   Speed  
0    NaN  
1    NaN  
2    NaN  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2      NaN   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2      NaN   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4      NaN   

    Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN