Matplotlib 按分类因素散点颜色
Matplotlib scatter color by categorical factors
我有一个基本的散点图,其中 x 和 y 是浮动的。但我想根据第三个分类变量更改标记的颜色。分类变量是字符串形式。这似乎引起了一个问题。
要使用 iris 数据集-这是我想我会使用的代码:
#Scatter of Petal
x=df['Petal Length']
y=df['Petal Width']
z=df['Species']
plt.scatter(x, y, c=z, s=15, cmap='hot')
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
但我得到一个错误:无法将字符串转换为浮点数:iris-setosa
我是否必须在 运行 之前将分类变量更改为数字变量,或者我可以对当前格式的数据做些什么?
谢谢
更新:整个追溯是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-d67ee3bffc3b> in <module>()
3 y=df['Petal Width']
4 z=df['Species']
----> 5 plt.scatter(x, y, c=z, s=15, cmap='hot')
6 plt.xlabel('Petal Width')
7 plt.ylabel('Petal Length')
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, hold, **kwargs)
3198 ret = ax.scatter(x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
3199 vmin=vmin, vmax=vmax, alpha=alpha,
-> 3200 linewidths=linewidths, verts=verts, **kwargs)
3201 draw_if_interactive()
3202 finally:
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
3605
3606 if c_is_stringy:
-> 3607 colors = mcolors.colorConverter.to_rgba_array(c, alpha)
3608 else:
3609 # The inherent ambiguity is resolved in favor of color
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba_array(self, c, alpha)
420 result = np.zeros((nc, 4), dtype=np.float)
421 for i, cc in enumerate(c):
--> 422 result[i] = self.to_rgba(cc, alpha)
423 return result
424
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba(self, arg, alpha)
374 except (TypeError, ValueError) as exc:
375 raise ValueError(
--> 376 'to_rgba: Invalid rgba arg "%s"\n%s' % (str(arg), exc))
377
378 def to_rgba_array(self, c, alpha=None):
ValueError: to_rgba: Invalid rgba arg "Iris-setosa"
to_rgb: Invalid rgb arg "Iris-setosa"
could not convert string to float: iris-setosa
正如你的回溯告诉你的,你不能将字符串传递给颜色参数。您可以传递颜色或将解释为颜色本身的值数组。
见:
http://matplotlib.org/api/pyplot_api.html?highlight=plot#matplotlib.pyplot.plot
可能有一种更优雅的方式,但一种实现方式如下(我使用了以下数据集:https://raw.githubusercontent.com/pydata/pandas/master/pandas/tests/data/iris.csv):
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from pandas import read_csv
df = read_csv('iris.csv')
#Scatter of Petal
x=df['PetalLength']
y=df['PetalWidth']
# Get unique names of species
uniq = list(set(df['Name']))
# Set the color map to match the number of species
z = range(1,len(uniq))
hot = plt.get_cmap('hot')
cNorm = colors.Normalize(vmin=0, vmax=len(uniq))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=hot)
# Plot each species
for i in range(len(uniq)):
indx = df['Name'] == uniq[i]
plt.scatter(x[indx], y[indx], s=15, color=scalarMap.to_rgba(i), label=uniq[i])
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
plt.legend(loc='upper left')
plt.show()
给出如下内容:
编辑:为图例明确添加标签。
根据@jonnybazookatone 的回答,这是我的方法。我使用 groupby 创建一个小数据框,用于在 Name
和 name_id
之间查找。然后我再次分组,遍历组...
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from pandas import read_csv
df = read_csv('iris.csv')
# map Name to integer
pos = df.loc[:,["Name"]].groupby("Name").count().reset_index()
# create a new column in the dataframe which contains the numeric value
tag_to_index = lambda x: pos.loc[pos.Name == x.Name].index[0]
df.loc[:,"name_index"]=df.loc[:,["Name"]].apply(tag_to_index, axis=1)
# Set the color map to match the number of species
hot = plt.get_cmap('hot')
cNorm = colors.Normalize(vmin=0, vmax=len(pos))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=hot)
# Get unique names of species
for (name, group) in df.groupby("name_index"):
plt.scatter(group.PetalWidth, group.PetalLength, s=15, label=pos.iloc[name].get("Name"), color=scalarMap.to_rgba(name))
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
plt.legend()
plt.show()
Altair 在这里应该是轻而易举的事。
from altair import *
import pandas as pd
df = datasets.load_dataset('iris')
Chart(df).mark_point().encode(x='petalLength',y='sepalLength', color='species')
最简单的方法是简单地将整数类别级别数组传递给 plt.scatter()
颜色参数。
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
plt.scatter(iris['petal_length'], iris['petal_width'], c=pd.factorize(iris['species'])[0])
plt.gca().set(xlabel='Petal Width', ylabel='Petal Length', title='Petal Width vs Length')
这会创建一个没有图例的图,使用默认值 "viridis" colormap。
要选择自己的 colormap and add a legend,最简单的方法是:
import matplotlib.patches
levels, categories = pd.factorize(iris['species'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
plt.scatter(iris['petal_length'], iris['petal_width'], c=colors)
plt.gca().set(xlabel='Petal Width', ylabel='Petal Length', title='Petal Width vs Length')
plt.legend(handles=handles, title='Species')
我在这里选择了“tab10”离散(又名定性)颜色图。
加分项:
在第一个图中,通过将 min-max scaled 值从类别级别整数数组 pd.factorize(iris['species'])[0]
传递到 [=14= 的 call 方法来选择默认颜色] 颜色图对象。
我有一个基本的散点图,其中 x 和 y 是浮动的。但我想根据第三个分类变量更改标记的颜色。分类变量是字符串形式。这似乎引起了一个问题。
要使用 iris 数据集-这是我想我会使用的代码:
#Scatter of Petal
x=df['Petal Length']
y=df['Petal Width']
z=df['Species']
plt.scatter(x, y, c=z, s=15, cmap='hot')
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
但我得到一个错误:无法将字符串转换为浮点数:iris-setosa
我是否必须在 运行 之前将分类变量更改为数字变量,或者我可以对当前格式的数据做些什么?
谢谢
更新:整个追溯是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-d67ee3bffc3b> in <module>()
3 y=df['Petal Width']
4 z=df['Species']
----> 5 plt.scatter(x, y, c=z, s=15, cmap='hot')
6 plt.xlabel('Petal Width')
7 plt.ylabel('Petal Length')
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, hold, **kwargs)
3198 ret = ax.scatter(x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
3199 vmin=vmin, vmax=vmax, alpha=alpha,
-> 3200 linewidths=linewidths, verts=verts, **kwargs)
3201 draw_if_interactive()
3202 finally:
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
3605
3606 if c_is_stringy:
-> 3607 colors = mcolors.colorConverter.to_rgba_array(c, alpha)
3608 else:
3609 # The inherent ambiguity is resolved in favor of color
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba_array(self, c, alpha)
420 result = np.zeros((nc, 4), dtype=np.float)
421 for i, cc in enumerate(c):
--> 422 result[i] = self.to_rgba(cc, alpha)
423 return result
424
/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba(self, arg, alpha)
374 except (TypeError, ValueError) as exc:
375 raise ValueError(
--> 376 'to_rgba: Invalid rgba arg "%s"\n%s' % (str(arg), exc))
377
378 def to_rgba_array(self, c, alpha=None):
ValueError: to_rgba: Invalid rgba arg "Iris-setosa"
to_rgb: Invalid rgb arg "Iris-setosa"
could not convert string to float: iris-setosa
正如你的回溯告诉你的,你不能将字符串传递给颜色参数。您可以传递颜色或将解释为颜色本身的值数组。
见: http://matplotlib.org/api/pyplot_api.html?highlight=plot#matplotlib.pyplot.plot
可能有一种更优雅的方式,但一种实现方式如下(我使用了以下数据集:https://raw.githubusercontent.com/pydata/pandas/master/pandas/tests/data/iris.csv):
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from pandas import read_csv
df = read_csv('iris.csv')
#Scatter of Petal
x=df['PetalLength']
y=df['PetalWidth']
# Get unique names of species
uniq = list(set(df['Name']))
# Set the color map to match the number of species
z = range(1,len(uniq))
hot = plt.get_cmap('hot')
cNorm = colors.Normalize(vmin=0, vmax=len(uniq))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=hot)
# Plot each species
for i in range(len(uniq)):
indx = df['Name'] == uniq[i]
plt.scatter(x[indx], y[indx], s=15, color=scalarMap.to_rgba(i), label=uniq[i])
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
plt.legend(loc='upper left')
plt.show()
给出如下内容:
编辑:为图例明确添加标签。
根据@jonnybazookatone 的回答,这是我的方法。我使用 groupby 创建一个小数据框,用于在 Name
和 name_id
之间查找。然后我再次分组,遍历组...
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from pandas import read_csv
df = read_csv('iris.csv')
# map Name to integer
pos = df.loc[:,["Name"]].groupby("Name").count().reset_index()
# create a new column in the dataframe which contains the numeric value
tag_to_index = lambda x: pos.loc[pos.Name == x.Name].index[0]
df.loc[:,"name_index"]=df.loc[:,["Name"]].apply(tag_to_index, axis=1)
# Set the color map to match the number of species
hot = plt.get_cmap('hot')
cNorm = colors.Normalize(vmin=0, vmax=len(pos))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=hot)
# Get unique names of species
for (name, group) in df.groupby("name_index"):
plt.scatter(group.PetalWidth, group.PetalLength, s=15, label=pos.iloc[name].get("Name"), color=scalarMap.to_rgba(name))
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
plt.legend()
plt.show()
Altair 在这里应该是轻而易举的事。
from altair import *
import pandas as pd
df = datasets.load_dataset('iris')
Chart(df).mark_point().encode(x='petalLength',y='sepalLength', color='species')
最简单的方法是简单地将整数类别级别数组传递给 plt.scatter()
颜色参数。
import pandas as pd
import matplotlib.pyplot as plt
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
plt.scatter(iris['petal_length'], iris['petal_width'], c=pd.factorize(iris['species'])[0])
plt.gca().set(xlabel='Petal Width', ylabel='Petal Length', title='Petal Width vs Length')
这会创建一个没有图例的图,使用默认值 "viridis" colormap。
要选择自己的 colormap and add a legend,最简单的方法是:
import matplotlib.patches
levels, categories = pd.factorize(iris['species'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
plt.scatter(iris['petal_length'], iris['petal_width'], c=colors)
plt.gca().set(xlabel='Petal Width', ylabel='Petal Length', title='Petal Width vs Length')
plt.legend(handles=handles, title='Species')
我在这里选择了“tab10”离散(又名定性)颜色图。
加分项:
在第一个图中,通过将 min-max scaled 值从类别级别整数数组 pd.factorize(iris['species'])[0]
传递到 [=14= 的 call 方法来选择默认颜色] 颜色图对象。