使用 python 的社交网络中的同质性
Homophily in a social network using python
我正在尝试确定具有节点作为键和颜色作为值的数据集的同质性机会,然后是同质性。
示例:
Node Target Colors
A N 1
N A 0
A D 1
D A 1
C X 1
X C 0
S D 0
D S 1
B 0
R N 2
N R 2
颜色与节点列相关联,范围从 0 到 2 (int)。
计算特征 z(在我的例子中是 Color)上的同质概率的步骤如下所示:
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
print("\nChance of same color:", round(chance_homophily(c_list),2))
其中chance_homophily
定义如下:
# The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)
def chance_homophily(dataset):
freq_dict = Counter([tuple(x) for x in dataset.values()])
df_freq_counter = freq_dict
c_list = list(df_freq_counter.values())
chance_homophily = 0
for class_count in c_list:
chance_homophily += (class_count/sum(c_list))**2
return chance_homophily
那么同质性计算如下:
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
G 应该从我上面的数据集构建(因此同时考虑节点和目标列)。
我对这个网络并不完全熟悉 属性 但我认为我在实现中遗漏了一些东西(例如,它是否正确地计算了网络中节点之间的关系?)。在网络上找到的另一个示例(具有不同的数据集)
特征也是颜色(虽然它是一个字符串,而我有一个数字变量)。我不知道他们是否考虑了节点之间的关系来确定,也许使用邻接矩阵:这部分在我的代码中没有实现,我正在使用
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
您的代码运行良好。您唯一缺少的是 IDs dict,它将节点的名称映射到图 G 中的节点名称。通过从 pandas 边列表创建图,您已经在命名节点,因为它们在数据中。
这样就不需要使用“IDs”字典了。查看下面的示例,一次使用 ID 字典,一次使用简单的字典来使用原始函数:
import networkx as nx
import pandas as pd
from collections import Counter
df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
"Target":["N","A","D","A","X","C","D","S","","N","R"],
"Colors":[1,0,1,1,1,0,0,1,0,2,2]})
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
def homophily_without_ids(G, chars):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if n1 in chars and n2 in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[n1] == chars[n2]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily_without_ids(G, c_list))
#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily(G, c_list, nodes_ids))
我正在尝试确定具有节点作为键和颜色作为值的数据集的同质性机会,然后是同质性。
示例:
Node Target Colors
A N 1
N A 0
A D 1
D A 1
C X 1
X C 0
S D 0
D S 1
B 0
R N 2
N R 2
颜色与节点列相关联,范围从 0 到 2 (int)。 计算特征 z(在我的例子中是 Color)上的同质概率的步骤如下所示:
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
print("\nChance of same color:", round(chance_homophily(c_list),2))
其中chance_homophily
定义如下:
# The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)
def chance_homophily(dataset):
freq_dict = Counter([tuple(x) for x in dataset.values()])
df_freq_counter = freq_dict
c_list = list(df_freq_counter.values())
chance_homophily = 0
for class_count in c_list:
chance_homophily += (class_count/sum(c_list))**2
return chance_homophily
那么同质性计算如下:
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
G 应该从我上面的数据集构建(因此同时考虑节点和目标列)。 我对这个网络并不完全熟悉 属性 但我认为我在实现中遗漏了一些东西(例如,它是否正确地计算了网络中节点之间的关系?)。在网络上找到的另一个示例(具有不同的数据集)
特征也是颜色(虽然它是一个字符串,而我有一个数字变量)。我不知道他们是否考虑了节点之间的关系来确定,也许使用邻接矩阵:这部分在我的代码中没有实现,我正在使用
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
您的代码运行良好。您唯一缺少的是 IDs dict,它将节点的名称映射到图 G 中的节点名称。通过从 pandas 边列表创建图,您已经在命名节点,因为它们在数据中。
这样就不需要使用“IDs”字典了。查看下面的示例,一次使用 ID 字典,一次使用简单的字典来使用原始函数:
import networkx as nx
import pandas as pd
from collections import Counter
df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
"Target":["N","A","D","A","X","C","D","S","","N","R"],
"Colors":[1,0,1,1,1,0,0,1,0,2,2]})
c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
def homophily_without_ids(G, chars):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if n1 in chars and n2 in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[n1] == chars[n2]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily_without_ids(G, c_list))
#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}
def homophily(G, chars, IDs):
"""
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
"""
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
num_ties+=1
if chars[IDs[n1]] == chars[IDs[n2]]:
num_same_ties+=1
return (num_same_ties / num_ties)
print(homophily(G, c_list, nodes_ids))