Pandas Python - 根据多个条件提取行
Pandas Python -Extract rows based on multiple conditions
我需要根据 3 个条件提取行:
列 col1
应包含列表 list_words.
中的所有单词
第一行应以单词 Story
结尾
下一行应以 ac
结尾
我已经设法让它在这个问题 的帮助下工作,但问题是我需要提取以 Story
结尾的每一行以及该行之后的行以 ac
结尾。
这是我当前的代码:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
df["conditions&"] = df.conditions | df.conditions.shift(1)
print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])
df.to_excel('cond.xlsx', 'Sheet1', index=True)
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)
这是我的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
这是期望的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
2 anny ac cc True ac
3 antoine ac dd True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo True ac
我需要在以 Story
结尾的行之后提取所有以 ac
结尾的行(包括第 2 行和第 3 行),而不仅仅是第一行。
可行吗?
也许你可以通过创建一个满足两个条件的栏目来做到这一点endswith
故事和所有的话。创建 endswith
ac 的另一列。在创建的第一列的 cumsum
上使用 groupby
,然后在 'gr' 和 'ac' 以及 cummin
两列上执行 any
,这意味着每个组,一旦它满足 False 条件,即使行以 ac 结尾,该组的其余部分也将为 False。 groupby 将为您要保留的行创建一个带有 True 的掩码,因此请将 loc
与此掩码一起使用:
df['gr'] = (df['col1'].str.endswith('Story')
&df['col1'].apply(lambda x: not bool(set_words - set(x.split()))))
df['ac'] = df['col1'].str.endswith('ac')
df_f = df.loc[df.groupby(df['gr'].cumsum())
.apply(lambda x: np.any(x[['gr', 'ac']], axis=1).cummin())
.to_numpy(), ['col1', 'col2']]
print (df_f)
col1 col2
0 Draft SW Quality Assurance Plan Story aa
1 alex ac bb
2 anny ac cc
3 antoine ac dd
6 Complete SW Quality Assurance Plan Story gg
7 celine ac hh
11 Update SW Quality Assurance Plan Story ll
12 joseph ac mm
13 Update SW Quality Assurance Plan ac nn
14 joseph ac oo
我需要根据 3 个条件提取行:
列
中的所有单词col1
应包含列表 list_words.第一行应以单词
结尾Story
下一行应以
结尾ac
我已经设法让它在这个问题 Story
结尾的每一行以及该行之后的行以 ac
结尾。
这是我当前的代码:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
df["conditions&"] = df.conditions | df.conditions.shift(1)
print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])
df.to_excel('cond.xlsx', 'Sheet1', index=True)
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)
这是我的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
这是期望的输出:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
2 anny ac cc True ac
3 antoine ac dd True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo True ac
我需要在以 Story
结尾的行之后提取所有以 ac
结尾的行(包括第 2 行和第 3 行),而不仅仅是第一行。
可行吗?
也许你可以通过创建一个满足两个条件的栏目来做到这一点endswith
故事和所有的话。创建 endswith
ac 的另一列。在创建的第一列的 cumsum
上使用 groupby
,然后在 'gr' 和 'ac' 以及 cummin
两列上执行 any
,这意味着每个组,一旦它满足 False 条件,即使行以 ac 结尾,该组的其余部分也将为 False。 groupby 将为您要保留的行创建一个带有 True 的掩码,因此请将 loc
与此掩码一起使用:
df['gr'] = (df['col1'].str.endswith('Story')
&df['col1'].apply(lambda x: not bool(set_words - set(x.split()))))
df['ac'] = df['col1'].str.endswith('ac')
df_f = df.loc[df.groupby(df['gr'].cumsum())
.apply(lambda x: np.any(x[['gr', 'ac']], axis=1).cummin())
.to_numpy(), ['col1', 'col2']]
print (df_f)
col1 col2
0 Draft SW Quality Assurance Plan Story aa
1 alex ac bb
2 anny ac cc
3 antoine ac dd
6 Complete SW Quality Assurance Plan Story gg
7 celine ac hh
11 Update SW Quality Assurance Plan Story ll
12 joseph ac mm
13 Update SW Quality Assurance Plan ac nn
14 joseph ac oo