使用部分字符串匹配从 DF 中删除行

Question

从以下 df 我想删除所有不以 CBT_21 或 CBT_TY1 开头的行：

               code        date type  strike  settlement
0    CBT_06_F2016_S  2015-01-02    P   240.0        2.45
1    CBT_06_F2016_S  2015-01-02    P   360.0       48.60
2    CBT_21_F2016_S  2015-01-02    P   210.0        0.80
3    CBT_TY1_F2016_S 2015-01-02    P   320.0       23.20
4    CBT_06_F2016_S  2015-01-02    C   430.0        3.70

期望的输出：

               code        date type  strike  settlement
0    CBT_21_F2016_S  2015-01-02    P   210.0        0.80
1    CBT_TY1_F2016_S 2015-01-02    P   320.0       23.20

请问最有效的 pythonic 方法是什么？我有一个非常大的文件要处理。

Answer 1

您可以将 boolean indexing with startswith and chain by | (bitwise or) or use str.contains 与 ^ 一起用于 select 个字符串开头：

m = df['code'].str.startswith('CBT_21') | df['code'].str.startswith('CBT_TY1')
df = df[m]
print (df)
              code        date type  strike  settlement
2   CBT_21_F2016_S  2015-01-02    P   210.0         0.8
3  CBT_TY1_F2016_S  2015-01-02    P   320.0        23.2

另一个解决方案：

m = df['code'].str.contains('^CBT_21|^CBT_TY1')
df = df[m]
print (df)
              code        date type  strike  settlement
2   CBT_21_F2016_S  2015-01-02    P   210.0         0.8
3  CBT_TY1_F2016_S  2015-01-02    P   320.0        23.2

使用部分字符串匹配从 DF 中删除行

Delete rows from DF with partial string matching

pandas

python-3.6