Select 行，如果列满足条件

Question

我有一个包含 75 列的 DataFrame。

如何根据特定列数组中的条件 select 行？如果我想在所有列上执行此操作，我可以使用

df[(df.values > 1.5).any(1)]

但假设我只想在列 3:45 上执行此操作。

Answer 1

使用 ix 使用序号位置对列进行切片：

In [31]:
df = pd.DataFrame(np.random.randn(5,10), columns=list('abcdefghij'))
df

Out[31]:
          a         b         c         d         e         f         g  \
0 -0.362353  0.302614 -1.007816 -0.360570  0.317197  1.131796  0.351454   
1  1.008945  0.831101 -0.438534 -0.653173  0.234772 -1.179667  0.172774   
2  0.900610  0.409017 -0.257744  0.167611  1.041648 -0.054558 -0.056346   
3  0.335052  0.195865  0.085661  0.090096  2.098490  0.074971  0.083902   
4 -0.023429 -1.046709  0.607154  2.219594  0.381031 -2.047858 -0.725303   

          h         i         j  
0  0.533436 -0.374395  0.633296  
1  2.018426 -0.406507 -0.834638  
2 -0.079477  0.506729  1.372538  
3 -0.791867  0.220786 -1.275269  
4 -0.584407  0.008437 -0.046714

所以要对第 4 到第 5 列进行切片：

In [32]:
df.ix[:, 3:5]

Out[32]:
          d         e
0 -0.360570  0.317197
1 -0.653173  0.234772
2  0.167611  1.041648
3  0.090096  2.098490
4  2.219594  0.381031

所以在你的情况下

df[(df.ix[:, 2:45]).values > 1.5).any(1)]

应该可以

索引是基于 0 的，包括开放范围，但不包括结束范围，所以这里包括第 3 列，我们切片到第 46 列，但这不包括在切片中

Answer 2

另一种解法iloc, values可以省略：

#if need from 3rd to 45th columns 
print (df[((df.iloc[:, 2:45]) > 1.5).any(1)])

样本：

np.random.seed(1)
df = pd.DataFrame(np.random.randint(3, size=(5,10)), columns=list('abcdefghij'))
print (df)
   a  b  c  d  e  f  g  h  i  j
0  1  0  0  1  1  0  0  1  0  1
1  0  2  1  2  0  2  1  2  0  0
2  2  0  1  2  2  0  1  1  2  0
3  2  1  1  1  1  2  1  1  0  0
4  1  0  0  1  2  1  0  2  2  1

print (df[((df.iloc[:, 2:5]) > 1.5).any(1)])
   a  b  c  d  e  f  g  h  i  j
1  0  2  1  2  0  2  1  2  0  0
2  2  0  1  2  2  0  1  1  2  0
4  1  0  0  1  2  1  0  2  2  1

Select 行，如果列满足条件

Select rows if columns meet condition

indexing

any

conditional-statements

dataframe

pandas