在 Python 中绘制有向图?
Plot a directed graph in Python?
我正在尝试为客户状态迁移制作有向图或桑基图(任何一个都可以)。数据如下所示,count 表示从当前状态迁移到下一个状态的用户数。
**current_state next_state count**
New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673
我写了一个构建sankey的代码,但是剧情不太好读。寻找可读的有向图。这是我的代码:
df = pd.read_csv('input.csv')
x = list(set(df.current_state.values) | set(df.next_state))
di = dict()
count = 0
for i in x:
di[i] = count
count += 1
#
df['source'] = df['current_state'].apply(lambda y : di[y])
df['target'] = df['next_state'].apply(lambda y : di[y])
#
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = x,
color = "blue"
),
link = dict(
source = df.source,
target = df.target,
value = df['count']
))])
#
fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
width=1000,
height=1000,
margin=go.layout.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
))
fig.show()
对于有向图,graphviz
将是我选择的工具,而不是 Python。
以下脚本 txt2dot.py
将您的数据转换为 graphviz 的输入文件:
text = '''New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673'''
# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')
# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))
print('digraph foo {')
for n in nodes:
print(f' {n};')
print()
for item in edges:
print(' ', item[0], ' -> ', item[1], ' [label="', item[2], '"];', sep='')
print('}')
运行 python3 txt2dot.py > foo.dot
结果:
digraph foo {
Applied;
End;
IntentDetected;
InterestedInJob;
JobRecommended;
NewProfile;
NotInterestedInJob;
NotOpted;
ProfileCreated;
ProfileInitiated;
NewProfile -> ProfileInitiated [label="37715"];
ProfileInitiated -> End [label="36411"];
JobRecommended -> End [label="6202"];
NewProfile -> End [label="6171"];
ProfileCreated -> JobRecommended [label="5799"];
ProfileInitiated -> ProfileCreated [label="4360"];
NewProfile -> NotOpted [label="3751"];
NotOpted -> ProfileInitiated [label="2817"];
JobRecommended -> InterestedInJob [label="2542"];
IntentDetected -> ProfileCreated [label="2334"];
ProfileCreated -> IntentDetected [label="1839"];
InterestedInJob -> Applied [label="1671"];
JobRecommended -> NotInterestedInJob [label="1477"];
NotInterestedInJob -> ProfileCreated [label="1408"];
IntentDetected -> End [label="1325"];
NotOpted -> End [label="1009"];
InterestedInJob -> ProfileCreated [label="975"];
Applied -> IntentDetected [label="912"];
NotInterestedInJob -> IntentDetected [label="720"];
Applied -> ProfileCreated [label="701"];
InterestedInJob -> End [label="673"];
}
运行 dot -o foo.png -Tpng foo.dot
给出:
这将创建一个基本的桑基图,假设您:
- 将您的数据保存在名为 state_migration.csv
的文件中
- 将标签(州名称)中的空格替换为 dash/underscore/nothing
- 用逗号替换列之间的空格
- 已安装 plotly、numpy 和 matplotlib
2 和 3 可以很容易地使用任何非史前文本编辑器,甚至 python 本身,如果它有很多数据。我强烈建议您避免在未加引号的值中使用空格。
Result
import plotly.graph_objects as go
import numpy as np
import matplotlib
if __name__ == '__main__':
with open('state_migration.csv', 'r') as finput:
info = [[ _ for _ in _.strip().lower().split(',') ]
for _ in finput.readlines()[1:]]
info_t = [*map(list,zip(*info))] # info transposed
# this exists to map the data to plotly's node indexing format
index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = list(index.keys()),
color = np.random.choice( list(matplotlib.colors.cnames.values()),
size=len(index.keys()), replace=False )
),
link = dict(
source = [index[_] for _ in info_t[0]],
target = [index[_] for _ in info_t[1]],
value = info_t[2]
))])
fig.update_layout(title_text="State Migration", font_size=12)
fig.show()
您可以拖动节点。如果您想预定义它们的位置或检查其他参数,请参阅 this。
我使用的数据是您输入的清理版本:
currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673
我将 "New Profile" 更改为现有状态 "New",因为图表在其他方面很奇怪。随意根据需要进行调整。
我使用的库绝对不是你想要的,我只是更熟悉它们。对于有向图,Roland Smith 已为您介绍。也可以使用 Plotly 完成,请参阅他们的 gallery
- Plotly 的替代品,按优先顺序排列:matplotlib、seaborne、ggplot、raw dot/graphviz
- matplotlib 仅在此处用于提供具有预定义十六进制颜色的列表
- numpy 仅用于从列表中选择一个随机值而不进行替换(在本例中为颜色)
在 Python 3.8.1
上测试
看起来 condekind 已经涵盖了答案,但是......当您使用 pandas 时,这些先前的答案应该有助于组织数据和制作图表的实际方面:
和alishobeiri
有许多您可以使用的有用示例和代码:https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/
连同 plot.ly documentation 一起回答了节点放置的具体问题。
如果桑基图很乱,请记住您也可以尝试垂直而不是水平方向。
我正在尝试为客户状态迁移制作有向图或桑基图(任何一个都可以)。数据如下所示,count 表示从当前状态迁移到下一个状态的用户数。
**current_state next_state count**
New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673
我写了一个构建sankey的代码,但是剧情不太好读。寻找可读的有向图。这是我的代码:
df = pd.read_csv('input.csv')
x = list(set(df.current_state.values) | set(df.next_state))
di = dict()
count = 0
for i in x:
di[i] = count
count += 1
#
df['source'] = df['current_state'].apply(lambda y : di[y])
df['target'] = df['next_state'].apply(lambda y : di[y])
#
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = x,
color = "blue"
),
link = dict(
source = df.source,
target = df.target,
value = df['count']
))])
#
fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
width=1000,
height=1000,
margin=go.layout.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
))
fig.show()
对于有向图,graphviz
将是我选择的工具,而不是 Python。
以下脚本 txt2dot.py
将您的数据转换为 graphviz 的输入文件:
text = '''New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673'''
# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')
# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))
print('digraph foo {')
for n in nodes:
print(f' {n};')
print()
for item in edges:
print(' ', item[0], ' -> ', item[1], ' [label="', item[2], '"];', sep='')
print('}')
运行 python3 txt2dot.py > foo.dot
结果:
digraph foo {
Applied;
End;
IntentDetected;
InterestedInJob;
JobRecommended;
NewProfile;
NotInterestedInJob;
NotOpted;
ProfileCreated;
ProfileInitiated;
NewProfile -> ProfileInitiated [label="37715"];
ProfileInitiated -> End [label="36411"];
JobRecommended -> End [label="6202"];
NewProfile -> End [label="6171"];
ProfileCreated -> JobRecommended [label="5799"];
ProfileInitiated -> ProfileCreated [label="4360"];
NewProfile -> NotOpted [label="3751"];
NotOpted -> ProfileInitiated [label="2817"];
JobRecommended -> InterestedInJob [label="2542"];
IntentDetected -> ProfileCreated [label="2334"];
ProfileCreated -> IntentDetected [label="1839"];
InterestedInJob -> Applied [label="1671"];
JobRecommended -> NotInterestedInJob [label="1477"];
NotInterestedInJob -> ProfileCreated [label="1408"];
IntentDetected -> End [label="1325"];
NotOpted -> End [label="1009"];
InterestedInJob -> ProfileCreated [label="975"];
Applied -> IntentDetected [label="912"];
NotInterestedInJob -> IntentDetected [label="720"];
Applied -> ProfileCreated [label="701"];
InterestedInJob -> End [label="673"];
}
运行 dot -o foo.png -Tpng foo.dot
给出:
这将创建一个基本的桑基图,假设您:
- 将您的数据保存在名为 state_migration.csv 的文件中
- 将标签(州名称)中的空格替换为 dash/underscore/nothing
- 用逗号替换列之间的空格
- 已安装 plotly、numpy 和 matplotlib
2 和 3 可以很容易地使用任何非史前文本编辑器,甚至 python 本身,如果它有很多数据。我强烈建议您避免在未加引号的值中使用空格。
Result
import plotly.graph_objects as go
import numpy as np
import matplotlib
if __name__ == '__main__':
with open('state_migration.csv', 'r') as finput:
info = [[ _ for _ in _.strip().lower().split(',') ]
for _ in finput.readlines()[1:]]
info_t = [*map(list,zip(*info))] # info transposed
# this exists to map the data to plotly's node indexing format
index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = list(index.keys()),
color = np.random.choice( list(matplotlib.colors.cnames.values()),
size=len(index.keys()), replace=False )
),
link = dict(
source = [index[_] for _ in info_t[0]],
target = [index[_] for _ in info_t[1]],
value = info_t[2]
))])
fig.update_layout(title_text="State Migration", font_size=12)
fig.show()
您可以拖动节点。如果您想预定义它们的位置或检查其他参数,请参阅 this。
我使用的数据是您输入的清理版本:
currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673
我将 "New Profile" 更改为现有状态 "New",因为图表在其他方面很奇怪。随意根据需要进行调整。
我使用的库绝对不是你想要的,我只是更熟悉它们。对于有向图,Roland Smith 已为您介绍。也可以使用 Plotly 完成,请参阅他们的 gallery
- Plotly 的替代品,按优先顺序排列:matplotlib、seaborne、ggplot、raw dot/graphviz
- matplotlib 仅在此处用于提供具有预定义十六进制颜色的列表
- numpy 仅用于从列表中选择一个随机值而不进行替换(在本例中为颜色)
在 Python 3.8.1
上测试看起来 condekind 已经涵盖了答案,但是......当您使用 pandas 时,这些先前的答案应该有助于组织数据和制作图表的实际方面:
和alishobeiri 有许多您可以使用的有用示例和代码:https://plot.ly/~alishobeiri/1591/plotly-sankey-diagrams/#/
连同 plot.ly documentation 一起回答了节点放置的具体问题。
如果桑基图很乱,请记住您也可以尝试垂直而不是水平方向。