Pandas:如果特定列不包含特定文本,则删除数据框中的行
Pandas: Delete Rows in a dataframe if specific columns don't contain specific text
我有df
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
.
.
其中 column_int
是包含整数的列,column A-D
包含文本值。我只想保留 Keep
或 Save
作为行值的行
之前:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
之后:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
我尝试了以下方法
for column in df:
if type(column) == object:
df = df[df[column].str.contains('Save')] | df[df[column].str.contains('Keep')]
else:
pass
如果没有 for 循环,可能会更容易、更清晰。
dfA = df.loc[(df.column_A == 'Save') or (df.column_A == 'Keep')]
dfB = df.loc[(df.column_B == 'Save') or (df.column_B == 'Keep')]
dfC = df.loc[(df.column_C == 'Save') or (df.column_C == 'Keep')]
dfD = df.loc[(df.column_D == 'Save') or (df.column_D == 'Keep')]
然后将数据帧连接在一起
df = pd.concat([dfA, dfB, dfC, dfD])
您可以在 axis=1
上使用 .apply()
on the selected columns, then for each column check for Save
or Keep
by str.contains
. Then, use .any()
(用于按行操作)来检查该行是否包含此类字符串。
最后按.loc
筛选,如下:
cols = ['column_A', 'column_B', 'column_C', 'column_D']
df.loc[df[cols].apply(lambda x: x.str.contains(r'Save|Keep')).any(axis=1)]
结果:
id column_int column_int.1 column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
我有df
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
.
.
其中 column_int
是包含整数的列,column A-D
包含文本值。我只想保留 Keep
或 Save
作为行值的行
之前:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
之后:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
我尝试了以下方法
for column in df:
if type(column) == object:
df = df[df[column].str.contains('Save')] | df[df[column].str.contains('Keep')]
else:
pass
如果没有 for 循环,可能会更容易、更清晰。
dfA = df.loc[(df.column_A == 'Save') or (df.column_A == 'Keep')]
dfB = df.loc[(df.column_B == 'Save') or (df.column_B == 'Keep')]
dfC = df.loc[(df.column_C == 'Save') or (df.column_C == 'Keep')]
dfD = df.loc[(df.column_D == 'Save') or (df.column_D == 'Keep')]
然后将数据帧连接在一起
df = pd.concat([dfA, dfB, dfC, dfD])
您可以在 axis=1
上使用 .apply()
on the selected columns, then for each column check for Save
or Keep
by str.contains
. Then, use .any()
(用于按行操作)来检查该行是否包含此类字符串。
最后按.loc
筛选,如下:
cols = ['column_A', 'column_B', 'column_C', 'column_D']
df.loc[df[cols].apply(lambda x: x.str.contains(r'Save|Keep')).any(axis=1)]
结果:
id column_int column_int.1 column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na