将 DataFrame 过滤为包含 2 个以上 True 元素的行
Filter DataFrame to rows with 2+ True elements
例如,使用
df[(df>1).any(1)]
我可以获取任何大于1的元素的数据,但是如果我想获取至少2个大于1的元素的数据,我该怎么做呢? Thx
试试这个:
df[(df>1).sum(1).gt(1)]
演示:
import string
In [118]: df = pd.DataFrame(np.random.rand(10,10)*1.2, columns=list(string.ascii_letters[:10]))
In [119]: df
Out[119]:
a b c d e f g h i j
0 0.934290 0.426050 0.165846 1.114521 1.101023 0.924071 0.241893 0.890354 1.168406 0.506547
1 0.576869 1.091996 0.272124 0.834070 0.229545 0.585501 1.114688 0.957817 1.151957 0.761277
2 0.016659 1.138262 0.481773 0.186753 0.176585 0.497437 0.321805 0.664140 0.738851 0.177179
3 0.192605 0.395377 0.950169 0.678960 0.525349 0.050877 0.181615 0.105080 0.385672 0.401810
4 1.184054 1.097378 0.197706 0.453395 0.258631 1.088337 0.139201 0.217262 0.369734 1.054716
5 0.246081 0.234748 0.879371 0.198397 0.288288 0.534848 0.561080 0.732490 0.156947 0.662194
6 0.660215 0.221513 0.224576 0.049425 0.339101 0.441393 1.122385 0.057968 1.094025 1.130691
7 0.022977 0.681718 0.314200 0.622263 0.692124 0.803743 0.783381 0.715494 0.434911 0.247724
8 0.815742 0.419933 0.019704 0.764557 0.074530 0.990639 0.801125 0.403838 0.680618 1.043551
9 1.061915 0.229453 0.446562 0.324415 0.121421 0.270542 0.884124 0.926168 0.282650 0.267467
In [120]: df[(df>1).sum(1).gt(1)]
Out[120]:
a b c d e f g h i j
0 0.934290 0.426050 0.165846 1.114521 1.101023 0.924071 0.241893 0.890354 1.168406 0.506547
1 0.576869 1.091996 0.272124 0.834070 0.229545 0.585501 1.114688 0.957817 1.151957 0.761277
4 1.184054 1.097378 0.197706 0.453395 0.258631 1.088337 0.139201 0.217262 0.369734 1.054716
6 0.660215 0.221513 0.224576 0.049425 0.339101 0.441393 1.122385 0.057968 1.094025 1.130691
例如,使用
df[(df>1).any(1)]
我可以获取任何大于1的元素的数据,但是如果我想获取至少2个大于1的元素的数据,我该怎么做呢? Thx
试试这个:
df[(df>1).sum(1).gt(1)]
演示:
import string
In [118]: df = pd.DataFrame(np.random.rand(10,10)*1.2, columns=list(string.ascii_letters[:10]))
In [119]: df
Out[119]:
a b c d e f g h i j
0 0.934290 0.426050 0.165846 1.114521 1.101023 0.924071 0.241893 0.890354 1.168406 0.506547
1 0.576869 1.091996 0.272124 0.834070 0.229545 0.585501 1.114688 0.957817 1.151957 0.761277
2 0.016659 1.138262 0.481773 0.186753 0.176585 0.497437 0.321805 0.664140 0.738851 0.177179
3 0.192605 0.395377 0.950169 0.678960 0.525349 0.050877 0.181615 0.105080 0.385672 0.401810
4 1.184054 1.097378 0.197706 0.453395 0.258631 1.088337 0.139201 0.217262 0.369734 1.054716
5 0.246081 0.234748 0.879371 0.198397 0.288288 0.534848 0.561080 0.732490 0.156947 0.662194
6 0.660215 0.221513 0.224576 0.049425 0.339101 0.441393 1.122385 0.057968 1.094025 1.130691
7 0.022977 0.681718 0.314200 0.622263 0.692124 0.803743 0.783381 0.715494 0.434911 0.247724
8 0.815742 0.419933 0.019704 0.764557 0.074530 0.990639 0.801125 0.403838 0.680618 1.043551
9 1.061915 0.229453 0.446562 0.324415 0.121421 0.270542 0.884124 0.926168 0.282650 0.267467
In [120]: df[(df>1).sum(1).gt(1)]
Out[120]:
a b c d e f g h i j
0 0.934290 0.426050 0.165846 1.114521 1.101023 0.924071 0.241893 0.890354 1.168406 0.506547
1 0.576869 1.091996 0.272124 0.834070 0.229545 0.585501 1.114688 0.957817 1.151957 0.761277
4 1.184054 1.097378 0.197706 0.453395 0.258631 1.088337 0.139201 0.217262 0.369734 1.054716
6 0.660215 0.221513 0.224576 0.049425 0.339101 0.441393 1.122385 0.057968 1.094025 1.130691