使用过滤器过滤数据框的两列
Filtering two columns of a dataframe with filter
我有以下类型的数据框:
df = pd.DataFrame(
{
"Name": [
[
" Verbundmörtel ",
" Compound Mortar ",
" Malta per stucchi e per incollaggio ",
],
[" StoLevell In Absolute ", " StoLevell In Absolute "],
[
" Anhydrit-FlieÃ\x9festrich ",
" Anhydrite Flowing Screed ",
" Massetto a base di anidrite ",
],
],
"NAME_FILE": [
"AdhesiveCoveringPlaster_2",
"AdhesiveMortarLevellInForAEVERO_720",
"AnhydriteFlowingScreed_20",
],
}
)
我正在尝试根据某些关键字过滤两列。我的想法是获得一个数据框,其中我有两列,但只有满足条件的值。
select_materials = {
"Plaster": list(
filter(
lambda x: "Hist".casefold() in x.casefold()
or "Plaster".casefold() in x.casefold()
or "Gips".casefold() in x.casefold(),
df.FILE,
)
)
}
为了方便起见,我只插入了整个文件中的几行,但希望我的objective是清楚的。结果应该是一个数据框,其中过滤器应用于两列。
如果同一行的任一列中都存在过滤词,则应保留整行。
与其使用 filter
,我建议使用更惯用的方法。
假设您要过滤单词 "Mortar":
# Simply define two filtering masks, since one column contains lists of strings,
# whereas the other one simply contains strings
mask1 = df["Name"].apply(lambda x: "Mortar".casefold() in "".join(x).casefold())
mask2 = df["NAME_FILE"].apply(lambda x: "Mortar".casefold() in x.casefold())
# If a filtered word is present in either column of the same row,
# then the whole row should be kept
print(df.loc[mask1 | mask2, :])
Name NAME_FILE
0 [ Verbundmörtel , Compound Mortar , Malta p... AdhesiveCoveringPlaster_2
^^^^^^
1 [ StoLevell In Absolute , StoLevell In Absolu... AdhesiveMortarLevellInForAEVERO_720
^^^^^^
我有以下类型的数据框:
df = pd.DataFrame(
{
"Name": [
[
" Verbundmörtel ",
" Compound Mortar ",
" Malta per stucchi e per incollaggio ",
],
[" StoLevell In Absolute ", " StoLevell In Absolute "],
[
" Anhydrit-FlieÃ\x9festrich ",
" Anhydrite Flowing Screed ",
" Massetto a base di anidrite ",
],
],
"NAME_FILE": [
"AdhesiveCoveringPlaster_2",
"AdhesiveMortarLevellInForAEVERO_720",
"AnhydriteFlowingScreed_20",
],
}
)
我正在尝试根据某些关键字过滤两列。我的想法是获得一个数据框,其中我有两列,但只有满足条件的值。
select_materials = {
"Plaster": list(
filter(
lambda x: "Hist".casefold() in x.casefold()
or "Plaster".casefold() in x.casefold()
or "Gips".casefold() in x.casefold(),
df.FILE,
)
)
}
为了方便起见,我只插入了整个文件中的几行,但希望我的objective是清楚的。结果应该是一个数据框,其中过滤器应用于两列。 如果同一行的任一列中都存在过滤词,则应保留整行。
与其使用 filter
,我建议使用更惯用的方法。
假设您要过滤单词 "Mortar":
# Simply define two filtering masks, since one column contains lists of strings,
# whereas the other one simply contains strings
mask1 = df["Name"].apply(lambda x: "Mortar".casefold() in "".join(x).casefold())
mask2 = df["NAME_FILE"].apply(lambda x: "Mortar".casefold() in x.casefold())
# If a filtered word is present in either column of the same row,
# then the whole row should be kept
print(df.loc[mask1 | mask2, :])
Name NAME_FILE
0 [ Verbundmörtel , Compound Mortar , Malta p... AdhesiveCoveringPlaster_2
^^^^^^
1 [ StoLevell In Absolute , StoLevell In Absolu... AdhesiveMortarLevellInForAEVERO_720
^^^^^^