Pandas Python - 根据多个条件提取行

Pandas Python -Extract rows based on multiple conditions

我需要根据 3 个条件提取行:

  1. col1 应包含列表 list_words.

    中的所有单词
  2. 第一行应以单词 Story

    结尾
  3. 下一行应以 ac

    结尾

我已经设法让它在这个问题 的帮助下工作,但问题是我需要提取以 Story 结尾的每一行以及该行之后的行以 ac 结尾。 这是我当前的代码:

import pandas as pd

df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
                   'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']}) 
print(df)

list_words="SW Quality Plan Story"
set_words = set(list_words.split())

df["Suffix"] = df.col1.apply(lambda x: x.split()[-1]) 


# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))

# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story") 

# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1) 

# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")

# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3

df["conditions&"] = df.conditions | df.conditions.shift(1)

print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])

df.to_excel('cond.xlsx', 'Sheet1', index=True) 

df["TrueFalse"] = df.conditions | df.conditions.shift(1)                                                                                         

df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)

这是我的输出:

0      Draft SW Quality Assurance Plan Story   aa       True  Story
1                                    alex ac   bb       True     ac
6   Complete SW Quality Assurance Plan Story   gg       True  Story
7                                  celine ac   hh       True     ac
11    Update SW Quality Assurance Plan Story   ll       True  Story
12                                 joseph ac   mm       True     ac

这是期望的输出:

0      Draft SW Quality Assurance Plan Story   aa       True  Story
1                                    alex ac   bb       True     ac
2                                    anny ac   cc       True     ac
3                                 antoine ac   dd       True     ac
6   Complete SW Quality Assurance Plan Story   gg       True  Story
7                                  celine ac   hh       True     ac
11    Update SW Quality Assurance Plan Story   ll       True  Story
12                                 joseph ac   mm       True     ac
13       Update SW Quality Assurance Plan ac   nn       True     ac
14                                 joseph ac   oo       True     ac

我需要在以 Story 结尾的行之后提取所有以 ac 结尾的行(包括第 2 行和第 3 行),而不仅仅是第一行。 可行吗?

也许你可以通过创建一个满足两个条件的栏目来做到这一点endswith故事和所有的话。创建 endswith ac 的另一列。在创建的第一列的 cumsum 上使用 groupby,然后在 'gr' 和 'ac' 以及 cummin 两列上执行 any,这意味着每个组,一旦它满足 False 条件,即使行以 ac 结尾,该组的其余部分也将为 False。 groupby 将为您要保留的行创建一个带有 True 的掩码,因此请将 loc 与此掩码一起使用:

df['gr'] = (df['col1'].str.endswith('Story')
            &df['col1'].apply(lambda x: not bool(set_words - set(x.split()))))
df['ac'] = df['col1'].str.endswith('ac')

df_f = df.loc[df.groupby(df['gr'].cumsum())
                .apply(lambda x: np.any(x[['gr', 'ac']], axis=1).cummin())
                .to_numpy(), ['col1', 'col2']]
print (df_f)
                                        col1 col2
0      Draft SW Quality Assurance Plan Story   aa
1                                    alex ac   bb
2                                    anny ac   cc
3                                 antoine ac   dd
6   Complete SW Quality Assurance Plan Story   gg
7                                  celine ac   hh
11    Update SW Quality Assurance Plan Story   ll
12                                 joseph ac   mm
13       Update SW Quality Assurance Plan ac   nn
14                                 joseph ac   oo