使用 Python 通过它们在 pandas 中的索引（位置）比较 DataFrame 列中顶部 X 中的任何给定两个值？

Question

我需要知道是否使用 Python.

的 pandas DataFrame 中给定列中的第一个值与第三个值相比是否更大

我已经用简单的 Python 语言做到了这一点，我将 post 下面的完整代码以更好地说明我想做的事情：

# given array or list: (supposing this is a pandas DataFrame column)
mylst=[23, 18, 15, 14, 19, 28, 37, 29, 99]

print(mylst)

# get top 5 values:
print(sorted(mylst, reverse=True))
print(sorted(mylst, reverse=True)[:5])
top5 = sorted(mylst, reverse=True)[:5]

# all top 5 values in the initial order as they are found in mylst:
mytops = [x for x in mylst if x in top5]

print(mytops)

mytops 仅包含来自 mylst 的前 5 个值，并且它们未按任何顺序排序，因此它们保留其初始值 order/indexes，因此打印 mytops 将输出：[23, 28, 37, 29 , 99] 因为它们在原始 mylst 中找到所以你知道 23 是第一个值，28 是第二个，37 和第三个值等等 mytops列表。

现在，正如我在第一个开头所说的那样

I need to know if e.g. first value compared to third value is bigger or not

在本机中 Python 这个例子只需要比较 23 和 37。

if mytops[0]>mytops[2]:
   # do something...
else:
   # do something else...

在 pandas 中，这将类似于：

df['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false')

其中包含：

df['first_vs_third'] = np.where(mytops[0]>mytops[2],1,0) #supposing that mytops[0]>mytops[2] works.

我的问题是：

假设我有一个巨大的 DataFrame（我的代码示例中的 mylst 是一个具有唯一值的 pandas 列），如何在 pandas 中以一种非常快速有效的方式做到这一点？

如何参数化 mytops[0]>mytops[2] 和所有原生 Python 代码？

如何在不需要使用本机 Python 代码的情况下从列中获取前 X 值，如上例所示？

如果 mylst 值不唯一怎么办？

在这种情况下代码会怎样？

提前致谢！

Answer 1

你的代码的 pandas 等价物是：

mylst = [23, 18, 15, 14, 19, 28, 37, 29, 99]
s = pd.Series(mylst)

s2 = s[s.isin(s.nlargest(5))]

if s.iloc[2]>s.iloc[0]:
    print('do something')
else:
    print('do something else')

对于索引 2 与 0、3 与 1 等的矢量比较，您可以使用：

s2.diff(2)[2:].lt(0).to_list()
# [False, False, False]

比较所有组合：

a = s2.to_numpy()
a>a[:,None]

array([[False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False, False,  True],
       [False, False,  True, False,  True],
       [False, False, False, False, False]])

# or
pd.DataFrame(a>a[:,None], index=s2, columns=s2)

       23     28     37     29     99
23  False   True   True   True   True
28  False  False   True   True   True
37  False  False  False  False   True
29  False  False   True  False   True
99  False  False  False  False  False

使用 Python 通过它们在 pandas 中的索引（位置）比较 DataFrame 列中顶部 X 中的任何给定两个值？

Compare any given two values in a top X from a DataFrame column by their index (position) in pandas using Python?

python

numpy

dataframe

python-3.x

pandas