使用 python 的社交网络中的同质性
Homophily in a social network using python
Node Target Colors
A N 1
N A 0
A D 1
D A 1
C X 1
X C 0
S D 0
D S 1
B 0
R N 2
N R 2
颜色与节点列相关联,范围从 0 到 2 (int)。
计算特征 z(在我的例子中是 Color)上的同质概率的步骤如下所示:
print("\nChance of same color:", round(chance_homophily(c_list),2))
# The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)
def chance_homophily(dataset):
freq_dict = Counter([tuple(x) for x in dataset.values()])
df_freq_counter = freq_dict
c_list = list(df_freq_counter.values())
chance_homophily = 0
for class_count in c_list:
chance_homophily += (class_count/sum(c_list))**2
return chance_homophily
def homophily(G, chars, IDs):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
if chars[IDs[n1]] == chars[IDs[n2]]:
return (num_same_ties / num_ties)
G 应该从我上面的数据集构建(因此同时考虑节点和目标列)。
我对这个网络并不完全熟悉 属性 但我认为我在实现中遗漏了一些东西(例如,它是否正确地计算了网络中节点之间的关系?)。在网络上找到的另一个示例(具有不同的数据集)
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
您的代码运行良好。您唯一缺少的是 IDs dict,它将节点的名称映射到图 G 中的节点名称。通过从 pandas 边列表创建图,您已经在命名节点,因为它们在数据中。
这样就不需要使用“IDs”字典了。查看下面的示例,一次使用 ID 字典,一次使用简单的字典来使用原始函数:
import networkx as nx
import pandas as pd
from collections import Counter
df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
def homophily_without_ids(G, chars):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if n1 in chars and n2 in chars:
if G.has_edge(n1, n2):
if chars[n1] == chars[n2]:
return (num_same_ties / num_ties)
print(homophily_without_ids(G, c_list))
#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}
def homophily(G, chars, IDs):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
if chars[IDs[n1]] == chars[IDs[n2]]:
return (num_same_ties / num_ties)
print(homophily(G, c_list, nodes_ids))
Node Target Colors
A N 1
N A 0
A D 1
D A 1
C X 1
X C 0
S D 0
D S 1
B 0
R N 2
N R 2
颜色与节点列相关联,范围从 0 到 2 (int)。 计算特征 z(在我的例子中是 Color)上的同质概率的步骤如下所示:
print("\nChance of same color:", round(chance_homophily(c_list),2))
# The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)
def chance_homophily(dataset):
freq_dict = Counter([tuple(x) for x in dataset.values()])
df_freq_counter = freq_dict
c_list = list(df_freq_counter.values())
chance_homophily = 0
for class_count in c_list:
chance_homophily += (class_count/sum(c_list))**2
return chance_homophily
def homophily(G, chars, IDs):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
if chars[IDs[n1]] == chars[IDs[n2]]:
return (num_same_ties / num_ties)
G 应该从我上面的数据集构建(因此同时考虑节点和目标列)。 我对这个网络并不完全熟悉 属性 但我认为我在实现中遗漏了一些东西(例如,它是否正确地计算了网络中节点之间的关系?)。在网络上找到的另一个示例(具有不同的数据集)
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
您的代码运行良好。您唯一缺少的是 IDs dict,它将节点的名称映射到图 G 中的节点名称。通过从 pandas 边列表创建图,您已经在命名节点,因为它们在数据中。
这样就不需要使用“IDs”字典了。查看下面的示例,一次使用 ID 字典,一次使用简单的字典来使用原始函数:
import networkx as nx
import pandas as pd
from collections import Counter
df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
G = nx.from_pandas_edgelist(df, source='Node', target='Target')
def homophily_without_ids(G, chars):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if n1 in chars and n2 in chars:
if G.has_edge(n1, n2):
if chars[n1] == chars[n2]:
return (num_same_ties / num_ties)
print(homophily_without_ids(G, c_list))
#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}
def homophily(G, chars, IDs):
Given a network G, a dict of characteristics chars for node IDs,
and dict of node IDs for each node in the network,
find the homophily of the network.
num_same_ties = 0
num_ties = 0
for n1, n2 in G.edges():
if IDs[n1] in chars and IDs[n2] in chars:
if G.has_edge(n1, n2):
if chars[IDs[n1]] == chars[IDs[n2]]:
return (num_same_ties / num_ties)
print(homophily(G, c_list, nodes_ids))