如何使用 pyplot 创建两个数据 类 的散点图?
How to create a scatter plot for two data classes with pyplot?
我有两组数据,x
和 y
作为整数。我需要使用 matplotlib.pyplot.scatter
绘制这两个数据点。我还需要用一种颜色绘制第一个类别 y == 0
,用另一种颜色绘制第二个类别 y == 1
。
我查看了散点函数的文档,但我不明白如何在一个图中完成所有这些操作。
示例数据:
2.897534798034255,0.872359037956732,1
1.234850239781278,-0.293047584301112,1
0.238575209753427,0.129572680572429,0
-0.109757648021958,0.484048547480385,1
1.109735783200013,-0.002785328902198,0
1.572803975652908,0.098547849368397,0
x 和 y 定义为:
x = data[:, [0, 1]]
y = data[:, -1].astype(int)
x 的大小为 2000,y 的大小为 1000
我的尝试:
pl.scatter(x, y==0, s=3, c='r')
pl.scatter(x, y==1, s=3, c='b')
pl.show()
你可以这样做:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[2.897534798034255,0.872359037956732,1],
[1.234850239781278,-0.293047584301112,1],
[0.238575209753427,0.129572680572429,0],
[-0.109757648021958,0.484048547480385,1],
[1.109735783200013,-0.002785328902198,0],
[1.572803975652908,0.098547849368397,0]])
x = data[:, [0, 1]]
y = data[:, -1].astype(int)
plt.scatter(x[:,0][y==0], x[:,1][y==0], s=3, c='r')
plt.scatter(x[:,0][y==1], x[:,1][y==1], s=3, c='b')
plt.show()
虽然这可能更具可读性:
x1 = data[:, 0]
x2 = data[:, 1]
y = data[:, -1].astype(int)
plt.scatter(x1[y==0], x2[y==0], s=3, c='r')
plt.scatter(x1[y==1], x2[y==1], s=3, c='b')
输出:
pyplot.scatter()
接受颜色列表,因此:
c = ['r' if yy==0 else 'b' for yy in y]
plt.scatter(x, y, c=c)
在您的代码中,y==0
生成的掩码只有 True
和 False
值,而不是要绘制的 y
值。如果 x
和 y
是 numpy 数组,你可以这样做:
mask = (y == 0)
plt.scatter(x[mask], y[mask], c='r')
mask = (y == 1)
plt.scatter(x[mask], y[mask], c='b')
不确定为什么要先提取 x
和 y
,然后再过滤。鉴于你有很多数据而类别不多,带标记的 plt.plot
也应该比 plt.scatter
:
更快
import numpy as np
import matplotlib.pyplot as plt
data = np.asarray([[2.897534798034255,0.872359037956732,1],
[1.234850239781278,-0.293047584301112,1],
[0.238575209753427,0.129572680572429,0],
[-0.109757648021958,0.484048547480385,1],
[1.109735783200013,-0.002785328902198,0],
[1.572803975652908,0.098547849368397,0]])
colors = ["blue", "red", "green"]
labels = ["A", "B", "C"]
for i, c, l in zip(np.unique(data[:, 2]), colors, labels):
plt.plot(data[data[:, 2]==i][:, 0], data[data[:, 2]==i][:, 1],
marker="o", markersize=7, ls="None", color=c,
label=f"The letter {l} represents category {int(i)}")
plt.legend()
plt.show()
示例输出:
我有两组数据,x
和 y
作为整数。我需要使用 matplotlib.pyplot.scatter
绘制这两个数据点。我还需要用一种颜色绘制第一个类别 y == 0
,用另一种颜色绘制第二个类别 y == 1
。
我查看了散点函数的文档,但我不明白如何在一个图中完成所有这些操作。
示例数据:
2.897534798034255,0.872359037956732,1
1.234850239781278,-0.293047584301112,1
0.238575209753427,0.129572680572429,0
-0.109757648021958,0.484048547480385,1
1.109735783200013,-0.002785328902198,0
1.572803975652908,0.098547849368397,0
x 和 y 定义为:
x = data[:, [0, 1]]
y = data[:, -1].astype(int)
x 的大小为 2000,y 的大小为 1000
我的尝试:
pl.scatter(x, y==0, s=3, c='r')
pl.scatter(x, y==1, s=3, c='b')
pl.show()
你可以这样做:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[2.897534798034255,0.872359037956732,1],
[1.234850239781278,-0.293047584301112,1],
[0.238575209753427,0.129572680572429,0],
[-0.109757648021958,0.484048547480385,1],
[1.109735783200013,-0.002785328902198,0],
[1.572803975652908,0.098547849368397,0]])
x = data[:, [0, 1]]
y = data[:, -1].astype(int)
plt.scatter(x[:,0][y==0], x[:,1][y==0], s=3, c='r')
plt.scatter(x[:,0][y==1], x[:,1][y==1], s=3, c='b')
plt.show()
虽然这可能更具可读性:
x1 = data[:, 0]
x2 = data[:, 1]
y = data[:, -1].astype(int)
plt.scatter(x1[y==0], x2[y==0], s=3, c='r')
plt.scatter(x1[y==1], x2[y==1], s=3, c='b')
输出:
pyplot.scatter()
接受颜色列表,因此:
c = ['r' if yy==0 else 'b' for yy in y]
plt.scatter(x, y, c=c)
在您的代码中,y==0
生成的掩码只有 True
和 False
值,而不是要绘制的 y
值。如果 x
和 y
是 numpy 数组,你可以这样做:
mask = (y == 0)
plt.scatter(x[mask], y[mask], c='r')
mask = (y == 1)
plt.scatter(x[mask], y[mask], c='b')
不确定为什么要先提取 x
和 y
,然后再过滤。鉴于你有很多数据而类别不多,带标记的 plt.plot
也应该比 plt.scatter
:
import numpy as np
import matplotlib.pyplot as plt
data = np.asarray([[2.897534798034255,0.872359037956732,1],
[1.234850239781278,-0.293047584301112,1],
[0.238575209753427,0.129572680572429,0],
[-0.109757648021958,0.484048547480385,1],
[1.109735783200013,-0.002785328902198,0],
[1.572803975652908,0.098547849368397,0]])
colors = ["blue", "red", "green"]
labels = ["A", "B", "C"]
for i, c, l in zip(np.unique(data[:, 2]), colors, labels):
plt.plot(data[data[:, 2]==i][:, 0], data[data[:, 2]==i][:, 1],
marker="o", markersize=7, ls="None", color=c,
label=f"The letter {l} represents category {int(i)}")
plt.legend()
plt.show()
示例输出: