Pandas:如果特定列不包含特定文本,则删除数据框中的行

Pandas: Delete Rows in a dataframe if specific columns don't contain specific text

我有df

     id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 1   2        int       int         ABC     ABC     ABC       ABC
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na
 4   5        int       imt         ABC     ABC     ABC       ABC
 .
 . 

其中 column_int 是包含整数的列,column A-D 包含文本值。我只想保留 KeepSave 作为行值的行

之前:

 id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 1   2        int       int         ABC     ABC     ABC       ABC
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na
 4   5        int       imt         ABC     ABC     ABC       ABC

之后:

 id  column_int column_int  column_A column_B column_C column_D
 0   1        int       int         ABC     ABC     Keep      na
 2   3        int       int         ABC     Save    na        na
 3   4        int       int         ABC     Keep    na        na

我尝试了以下方法

for column in df:
    if type(column) == object:
        df = df[df[column].str.contains('Save')] | df[df[column].str.contains('Keep')]
    else:
        pass

如果没有 for 循环,可能会更容易、更清晰。

dfA = df.loc[(df.column_A == 'Save') or (df.column_A == 'Keep')]
dfB = df.loc[(df.column_B == 'Save') or (df.column_B == 'Keep')]
dfC = df.loc[(df.column_C == 'Save') or (df.column_C == 'Keep')]
dfD = df.loc[(df.column_D == 'Save') or (df.column_D == 'Keep')]

然后将数据帧连接在一起

df = pd.concat([dfA, dfB, dfC, dfD])

您可以在 axis=1 上使用 .apply() on the selected columns, then for each column check for Save or Keep by str.contains. Then, use .any()(用于按行操作)来检查该行是否包含此类字符串。

最后按.loc筛选,如下:

cols = ['column_A',  'column_B',  'column_C',  'column_D']

df.loc[df[cols].apply(lambda x: x.str.contains(r'Save|Keep')).any(axis=1)]

结果:

   id column_int column_int.1 column_A column_B column_C column_D
0   1        int          int      ABC      ABC     Keep       na
2   3        int          int      ABC     Save       na       na
3   4        int          int      ABC     Keep       na       na