突出显示平行坐标图的特定数据点

Highlighting specific data points for parallel coordinates plot

我正在寻求有关 highlight/color 平行坐标图上特定数据点的帮助。我似乎找不到可行的方法。

本质上,我想如下绘制所有数据,然后取例如数据点的索引 [0, 1, 2] 并用第三种颜色为它们着色以突出显示它们(如果可能,还可以使它们更厚?)有什么建议吗?

from sklearn import datasets
from yellowbrick.features import ParallelCoordinates

iris = datasets.load_iris()
X = iris.data[:, :]
y = iris.target

features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
classes = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
title = "Plot over Iris Data"

# Instantiate the visualizer
visualizer = ParallelCoordinates(
    classes=classes, features=features, fast=False, alpha=.40, title=title)

# Fit the visualizer and display it
visualizer.fit_transform(X, y)
visualizer.finalize()  # creates title, legend, etc.

visualizer.ax.tick_params(labelsize=22)  # change size of tick labels
visualizer.ax.title.set_fontsize(30)  # change size of title

for text in visualizer.ax.legend_.texts:  # change size of legend texts
     text.set_fontsize(20)

visualizer.fig.tight_layout()  # fit all texts nicely into the surrounding figure
visualizer.fig.show()

目前,ParallelCoordinates.draw() 按顺序迭代数据点。因此,visualizer.ax 的子 Line2D 实例将遵循数据的顺序。因此,您可以这样做:

from sklearn import datasets
from yellowbrick.features import ParallelCoordinates

# New code ----------------------
import matplotlib.pyplot as plt
special_lines = [0, 1, 2]
# Put any property you want here.
special_properties = {'linestyle': '--', 'color': 'k', 
                      'linewidth': 5, 'zorder': float('inf'), 
                      'alpha': 1}
# End of new code ---------------

iris = datasets.load_iris()
X = iris.data[:, :]
y = iris.target

features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
classes = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
title = "Plot over Iris Data"

# Instantiate the visualizer
visualizer = ParallelCoordinates(
    classes=classes, features=features, fast=False, alpha=.40, title=title)

# Fit the visualizer and display it
visualizer.fit_transform(X, y)

# New code ----------------------
for line in [visualizer.ax.get_lines()[i] for i in special_lines]:
    plt.setp(line, **special_properties)
# End of new code ---------------

visualizer.finalize()  # creates title, legend, etc.

visualizer.ax.tick_params(labelsize=22)  # change size of tick labels
visualizer.ax.title.set_fontsize(30)  # change size of title

for text in visualizer.ax.legend_.texts:  # change size of legend texts
     text.set_fontsize(20)
        
visualizer.fig.tight_layout()  # fit all texts nicely into the surrounding figure
visualizer.fig.show()

结果:

请注意,添加行 in-order 的事实并没有写在文档中,它只是如何实现的。因此,他们可能会(即使我不希望如此)在未来的更新中改变这种行为。一种更安全的方法是手动检查该行的数据是否与可视化工具使用的转换数据相匹配。请注意,我们 需要 通常使用转换后的数据,因为 ParallelCoordinates 还实现了规范化器。这不是你的情况,但通常我们应该这样做:

# Perform AFTER visualizer.fit_transform(X, y).
import numpy as np

transformed_data = list(visualizer.transform(X[special_lines, :]))
for line in visualizer.ax.get_lines():
    for i, arr in enumerate(transformed_data[:]): 
        if np.array_equal(arr, line.get_data()[1]):
            plt.setp(line, **special_properties)
            break