pandas 中的相关数据可视化
Relative data visualization in pandas
我有一些数据如下:
+---------+-------+---------+----------------+
| Machine | Event | Outcome | Duration Total |
+---------+-------+---------+----------------+
| a | 1 | FAIL | 1127 |
| a | 2 | FAIL | 56099 |
| a | 2 | PASS | 15213 |
| a | 3 | FAIL | 13891 |
| a | 3 | PASS | 13934 |
| a | 4 | FAIL | 6844 |
| a | 5 | FAIL | 6449 |
| b | 1 | FAIL | 21331 |
| b | 2 | FAIL | 30362 |
| b | 3 | FAIL | 12194 |
| b | 3 | PASS | 7390 |
| b | 4 | FAIL | 35472 |
| b | 4 | PASS | 7731 |
| b | 5 | FAIL | 7654 |
| c | 1 | FAIL | 16833 |
| c | 1 | PASS | 21337 |
| c | 2 | FAIL | 440 |
| c | 2 | PASS | 14320 |
| c | 3 | FAIL | 5281 |
+---------+-------+---------+----------------+
我正在尝试绘制每个事件和每台机器的总持续时间的分类散点图。或者任何其他可视化来相对地分析它们。
什么是好的选择以及如何去做?
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x = 'Event', y = 'Duration', hue = 'Machine', col = 'Outcome', data = df)
试一试,它的两个散点图。 X 轴是事件,Y 轴是持续时间,圆点的颜色基于机器,有两张图,一张表示失败,旁边一张表示通过。 "df" 是你的数据框。您可以删除 col = 'Outcome'
以在同一图表上同时显示失败和通过。
编辑:
fig, ax = plt.subplots(figsize = (10,10))
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'PASS'], ax = ax)
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'FAIL'], ax = ax,
style = 'Machine', markers = ['x', 'x', 'x'])
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ['Machine - Pass', 'a' ,'b', 'c', 'Machine - Fail', 'a','b','c'])
plt.show()
我有一些数据如下:
+---------+-------+---------+----------------+
| Machine | Event | Outcome | Duration Total |
+---------+-------+---------+----------------+
| a | 1 | FAIL | 1127 |
| a | 2 | FAIL | 56099 |
| a | 2 | PASS | 15213 |
| a | 3 | FAIL | 13891 |
| a | 3 | PASS | 13934 |
| a | 4 | FAIL | 6844 |
| a | 5 | FAIL | 6449 |
| b | 1 | FAIL | 21331 |
| b | 2 | FAIL | 30362 |
| b | 3 | FAIL | 12194 |
| b | 3 | PASS | 7390 |
| b | 4 | FAIL | 35472 |
| b | 4 | PASS | 7731 |
| b | 5 | FAIL | 7654 |
| c | 1 | FAIL | 16833 |
| c | 1 | PASS | 21337 |
| c | 2 | FAIL | 440 |
| c | 2 | PASS | 14320 |
| c | 3 | FAIL | 5281 |
+---------+-------+---------+----------------+
我正在尝试绘制每个事件和每台机器的总持续时间的分类散点图。或者任何其他可视化来相对地分析它们。
什么是好的选择以及如何去做?
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x = 'Event', y = 'Duration', hue = 'Machine', col = 'Outcome', data = df)
试一试,它的两个散点图。 X 轴是事件,Y 轴是持续时间,圆点的颜色基于机器,有两张图,一张表示失败,旁边一张表示通过。 "df" 是你的数据框。您可以删除 col = 'Outcome'
以在同一图表上同时显示失败和通过。
编辑:
fig, ax = plt.subplots(figsize = (10,10))
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'PASS'], ax = ax)
g = sns.scatterplot(x = 'Event', y = 'Duration', hue = 'Machine', data = df[df['Outcome'] == 'FAIL'], ax = ax,
style = 'Machine', markers = ['x', 'x', 'x'])
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, ['Machine - Pass', 'a' ,'b', 'c', 'Machine - Fail', 'a','b','c'])
plt.show()