df.plot.scatter: c 和 cmap

df.plot.scatter: c and cmap

我有一个数据框(注意 - 数据是虚拟数据,不代表图中的内容):

    Index     BGC frequency - Count     Proportion of total BGCs both captured and not captured by antiSMASH - %
  species_a            1                                       2
  species_b            3                                       4
     ...              ...                                     ...

我想绘制 BGC frequency - CountProportion of total BGCs both captured and not captured by antiSMASH - % 的散点图,根据分类 Index 和图例对点进行着色。

import matplotlib.pyplot as plt
from matplotlib import colors
import pandas as pd

colorlist = list(colors.ColorConverter.colors.keys())
captured_df.plot.scatter(x='BGC frequency - Count', 
                         y= 'Proportion of total BGCs both captured and not captured by antiSMASH - %' , 
                          c = colorlist,
                         title = 'BGCs with an antiSMASH region')

让我接近:

但是我找不到图例。理想情况下,我想要显示的内容 here,第 69 行:

但是当我尝试时:

df.plot.scatter(x='BGC frequency - Count', y='Proportion of total BGCs both captured and not captured by antiSMASH - %', c=df.index, cmap="viridis", s=50)

我得到:

ValueError: 'c' argument must be a mpl color, a sequence of mpl colors or a sequence of numbers, not Index(...list of index species names...)

我不确定这是为什么 - 我认为 cmapc 数据转换为正确数据类型的列表?上面的 link 明确处理分类数据 -

If a categorical column is passed to c, then a discrete colorbar will be produced

另请注意,我不想要数字颜色条 - this 没有多大用处:

感谢阅读:D

诀窍是将“类型”列转换为分类(在您的情况下为 Index 列)。

例如:

d = pd.DataFrame([["a", 1,3], ["b", 3,3], ["b", 2,3], ["a", 5,2]], columns=['type', 'x', 'y'])
d['type'] = pd.Categorical(d['type'])
d.plot.scatter(x='x', y='y', c='type', cmap='inferno')
plt.show()

这应该有效。

另外值得一提的是,此功能来自 Pandas 版本 1.3.0 (July 2. 2021)!

确保使用合适的版本。