使用来自 python pandas 数据框的矩阵查找形成成对指标(距离)列表

Form a list of pairwise metrics (distance) using a matrix lookp comming from a python pandas dataframe

我有一个距离矩阵作为数据框:

data_map = {
'startNode':["0","0","0","0","0","455","455","455","455","455","10","10","10","10","10","30","30","30","30","30","2","2","2","2","2"],
'EndNode':["0","455","30","10","2","0","455","30","10","2","0","455","30","10","2","0","455","10","2","30","0","455","30","10","2"],
'Dmeters':["0","19481","94","90","10","19481","0","750","75","20","90","75","1013","0","200","94","750","1013","50","0","10","20","50","200","0"]
}

df_map_mat = pd.DataFrame.from_dict(data_map)

输入数据帧:

df_map_mat
    Out[141]: 
       startNode EndNode Dmeters
    0          0       0       0
    1          0     455   19481
    2          0      30      94
    3          0      10      90
    4          0       2      10
    5        455       0   19481
    6        455     455       0
    7        455      30     750
    8        455      10      75
    9        455       2      20
    10        10       0      90
    11        10     455      75
    12        10      30    1013
    13        10      10       0
    14        10       2     200
    15        30       0      94
    16        30     455     750
    17        30      10    1013
    18        30       2      50
    19        30      30       0
    20         2       0      10
    21         2     455      20
    22         2      30      50
    23         2      10     200
    24         2       2       0

我需要查询 df_map_mat 数据框并填充如下所示的列表列

列表列是通过查询 df_map_mat 的 NID 列形成的 例如:起始节点中的 0 和结束节点中的 0 距离为 0,同样 10 -> 0 为 90,类似 30 -> 455 为 750 米。

df_dist_mat = {
'Nid':["0","10","2","30","455"],
'NName':["Q-CH","ANGC","AmOR","ANAGER","RPURAM"],
'D_list':[ "[0,90,10,94,19481]","[90,0,200,1013,75]","[10,200,0,50,20]","[94,1013,50,0,750]","[19481,75,20,750,0]"]
}

df_dist_mat = pd.DataFrame.from_dict(df_dist_mat)

预期的数据帧:

df_dist_mat
Out[142]: 
   Nid   NName               D_list
0    0    Q-CH   [0,90,10,94,19481]
1   10    ANGC   [90,0,200,1013,75]
2    2    AmOR     [10,200,0,50,20]
3   30  ANAGER   [94,1013,50,0,750]
4  455  RPURAM  [19481,75,20,750,0]

[![enter code here][1]][1]

]

我已经在两个 np 数组中对 Nodeid 列进行了编码。这可能不是一个有效的解决方案,但却是一个给出答案的解决方案。

import numpy as np

x = np.array([[0],[10],[2], [30],[455]])
y = np.array([[0],[10],[2], [30],[455]])


def calc_dist(x,y):
    d_list = []
    for i in (x):
        d_inner_list = []
        for j in (y):
            i = int(i)
            j = int(j)
            match = df_map_mat[(df_map_mat["startNode"] == i) & (df_map_mat["EndNode"] == j)]
            d = match['Dmeters']
            dist = int(d)
            d_inner_list.append(dist)
        
        d_list.append(d_inner_list)
    print(d_list)
           
    
calc_dist(x,y)

解决方案:

calc_dist(x,y)
[[0, 90, 10, 94, 19481], [90, 0, 200, 1013, 75], [10, 200, 0, 50, 20], [94, 1013, 50, 0, 750], [19481, 75, 20, 750, 0]]

您可以使用 DataFrame.pivot with DataFrame.reindex:

arr = np.array([0,10,2,30,455])

df = (df_map_mat.astype({'startNode':int, 'EndNode':int})
                .pivot('startNode','EndNode','Dmeters')
                .reindex(index=arr, columns=arr))
print (df)
EndNode      0     10   2     30     455
startNode                               
0              0    90   10    94  19481
10            90     0  200  1013     75
2             10   200    0    50     20
30            94  1013   50     0    750
455        19481    75   20   750      0    

对于列表使用:

out = df.to_numpy().tolist()
print (out)
[[0, 90, 10, 94, 19481], [90, 0, 200, 1013, 75],
 [10, 200, 0, 50, 20], [94, 1013, 50, 0, 750],
 [19481, 75, 20, 750, 0]]