如何 select 所有包含值在 select 列中大于阈值的行？

Question

我正在尝试做与中相同的事情，但我有一个字符串类型的列，我需要将其保留在数据框中，以便我可以识别哪些行是哪些行。（我想我可以通过索引来做到这一点，但我希望能够节省一个步骤。）有没有一种方法可以在使用 .any() 时不对列进行计数，而是将其保留在生成的数据框中？谢谢！

这是所有列上的单词的代码：

df[(df > threshold).any(axis=1)]

这是我现在正在使用的硬编码版本：

df[(df[list_of__selected_columns] > 3).any(axis=1)]

这对我来说似乎有点笨拙，所以我想知道是否有更好的方法。

Answer 1

可以用.select_dtype全选，比如数值列：

df[df.select_dtype(include='number').gt(threshold).any(axis=1)]

或者一大块连续的列 iloc:

df[df.iloc[:,3:6].gt(threshold).any(axis=1)]

如果您想 select 一些随机的列列表，您最好通过硬编码列表来解决。

How to select all rows which contain values in selected columns greater than a threshold?