为每个 x 值找到 y 的最大值，并用一条线连接这些点

Question

我正在探索最好的方法。

我有一个 y 与 x 的散点图，其中 x 是人均收入。

将所有值绘制成散点图后，我想为每个 x 值（即每个收入水平）找到 y 的最大值，然后用一条线连接这些点。

如何在 Python 中执行此操作？

Answer 1

您有两个平行列表：x 和 y。您想要 将它们 x 分组 并在 y 中取最大值。首先，您应该将列表排序在一起。将它们压缩到元组列表中并排序：

xy = sorted(zip(x, y))

现在，按第一个元素（“x”）对排序后的列表进行分组。结果是一个元组列表，其中第一个元素是 x，第二个元素是包含该 x 的所有点的列表。自然每个点也是一个元组，每个元组的第一个元素都是同一个x:

from itertools import groupby
grouped = groupby(xy, lambda item: item[0])

最后，取每组点的 x 和最大值：

envelope = [(xp, max(points)[1]) for xp, points in grouped]

envelope 是包含散点图的 xy 元组列表。你可以进一步解压到xs和ys:

x1, y1 = zip(*envelope)

综合起来：

x1, y1 = zip(*[(xp, max(points)[1]) 
               for xp, points 
               in groupby(sorted(zip(x, y)), lambda item: item[0])])

Answer 2

您可以使用 pandas, because it has a convenient groupby 方法并与 matplotlib 配合使用：

import pandas as pd

# example data
df = pd.DataFrame({'x': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
                   'y': [3, 7, 9, 4, 1, 2, 8, 6, 4, 4, 3, 1]})

# standard scatter plot
ax = df.plot.scatter('x', 'y')

# find max. y value for each x
groupmax = df.groupby('x').max()

# connect max. values with lines
groupmax.plot(ax=ax, legend=False);

为每个 x 值找到 y 的最大值，并用一条线连接这些点

Find the highest value of y for each x value and connect the points with a line

python

numpy

graph

matplotlib