如何将数据框转换为三元组列表

How to convert a dataframe into a list of 3-tuples

我想使用 python 中的 networkx 库创建一个有向图。

我有一个 pandas 数据框,如下所示:

                                 Head Mounted Display  Marker  Smartphone
    2D data extrusion                               3       0           1   
    AgiSoft PhotoScan 3D design                     1       2           2   
    AuGeo Esri AR template                          1       1           2   
    BIM                                             1       1           0   
    Blender 3D design                               0       2           4   
    Bluetooth localization                          1       1           0   
    CityEngine                                      3       1           2   
    GIS data processing                             3       1           2   
    GNSS localization                               1       2           4   
    Google ARCore                                   0       1           5   
    Google SketchUp 3D design                       1       2           0   
    Image Stitching                                 1       1           4   
    Java Development Kit                            0       1           0   
    SLAM                                            1       2           2   
    Unity 3D                                        8      12          10   
    Unreal Engine                                   1       1           0   
    Vuforia                                         2       7           3

作为“networkx.DiGraph.add_weighted_edges_from”函数的输入,我需要将其格式化为如下所示的三元组列表:


('Head Mounted Display', '2D data extrusion', 3),
('Head Mounted Display', 'Agisoft PhotoScan 3D design', 1),
('Head Mounted Display','AuGeo Esri AR template', 1),
etc...

此外,还有权重为 0 的元组,例如:

('Marker', '2D data extrusion', 0)

需要从列表中删除。

有人知道怎么做吗?

提前致谢!

您可以关注下方代码

lstOfTuples = []
for i in range(df.shape[0]):
    for j in range(df.shape[1]):
        index = df.index[i]
        col = df.columns[j]
        value = float(df.loc[index, col])
        if value > 0:
            lstOfTuples.append((col, index, value))
lstOfTuples

像这样创建一个有向图

G = nx.Graph()
G.add_weighted_edges_from(ebunch_to_add=lstOfTuples)

您可以按如下方式创建所需元组列表:

def createTuples(df, onColumn=0):
    sze = df.shape[0]
    colName = list(df.columns)[onColumn]
    rslt = [] 
    for r in range(sze):
        if df.iloc[r][onColumn] > 0:
            rslt.append((colName, df.iloc[r].name, df.iloc[r][onColumn]))
    return rslt  

此方法允许您指定要在第一个元组位置使用的列标题。

使用df.columns[0]获取'HeadMountedDisplay',使用df.index[i]获取行名。请注意,df 指的是您的 df 名称。

然后使用带条件的元组:

tuple((df.columns[0], df.index[i], df[df.columns[0]][i]) for i in range(len(df)) if df[df.columns[0]][i] is not 0)

使用 .melt 将有助于获得您感兴趣的形状。这是一个可重现的示例:

import networkx as nx
import pandas as pd

# create a dummy dataframe with a similar structure
df = pd.DataFrame(zip(range(6), range(5, -1, -1)))
df.columns = list("ab")
df.index = list("qwerty")

# flatten the dataframe for easier processing
flat = df.melt(ignore_index=False).reset_index()

# ignore 0
mask = flat["value"] > 0
flat = flat.loc[mask]

# create a directed graphp
G = nx.DiGraph()

# fill-in with edges
for start, end, weight in flat.values:
    G.add_edge(start, end, weight=weight)

print(G.nodes())  # ['w', 'a', 'e', 'r', 't', 'y', 'q', 'b']
print(
    G.edges()
)  # [('w', 'a'), ('w', 'b'), ('e', 'a'), ('e', 'b'), ('r', 'a'), ('r', 'b'), ('t', 'a'), ('t', 'b'), ('y', 'a'), ('q', 'b')]

与@SultanOrazbayev 的回答类似,您可以融化数据框,但您可以利用 nx.from_pandas_edgelist 函数直接使用融化的数据框,而无需创建元组列表。

# Sample df
df = pd.DataFrame({'Head Mounted Display':[3,1,1,1,0],'Marker':[0,2,1,1,2],'Smartphone':[1,2,2,0,4]})
# melt the dataframe and filter out the rows with weight of zero
df_long_temp = df.reset_index().melt(id_vars='index',var_name='to',value_name='weight')
df_long = df_long_temp[df_long_temp['weight'] != 0]

# create the graph with edge weights
g = nx.from_pandas_edgelist(df_long,source='index',target='to',
                        edge_attr='weight',create_using=nx.DiGraph)

# drawing the graph
pos = nx.spring_layout(g)
nx.draw_networkx(g,pos=pos)
weight_dict = {(u,v):'w={}'.format(w) for u,v,w in g.edges(data='weight')}
nx.draw_networkx_edge_labels(g,pos=pos,edge_labels=weight_dict)