根据条件提取行 Pandas Python
Extract rows based on conditions Pandas Python
如果应用了某些条件,我需要提取行。
- 列
col1
应包含列表 list_words
中的所有单词。
- 最后一个词应该是
Story
- 下一行的最后一个字应该是b
ac
:
这是我当前的代码:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=pd.concat([df.col1.str.contains(word,regex=False) for word in list_words.split()],axis=1).sum(1) > 1
print('\n',df)
#extract last word
df["Suffix"] = df["col1"].str.split().str[-1]
print('\n',df)
df['ok']=''
for i in range (len(df)-1):
if ((df["Suffix"].iloc[i]=='Story') & (df["TrueFalse"].iloc[i]=='True') & (df["Suffix"].iloc[i+1]=='ac')):
df['ok'].iloc[i+1]=df["Suffix"].iloc[i+1]
print('\n',df)
输出:
col1 col2 TrueFalse Suffix ok
0 Draft SW Quality Assurance Story aa True Story
1 alex ac bb False ac
2 anny ac cc False ac
3 antoine ac dd False ac
4 aze epic ee False epic
5 bella ac ff False ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh False ac
8 wqas epic ii False epic
9 karmen ac jj False ac
10 kameilia ac kk False ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm False ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo False ac
第 13 行 应设置为 False
期望的输出:
col1 col2 TrueFalse Suffix
1 Complete SW Quality Assurance Plan Story gg True Story
2 celine ac hh True ac
3 Update SW Quality Assurance Plan Story ll True Story
4 joseph ac mm True ac
这是您的所有不同条件及其交集:
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
# Back to your notation:
# TrueFalse: rows that fulfill all three conditions along with their next rows
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
现在你想要的输出:
>>> print(df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse])
col1 col2 TrueFalse Suffix
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
仅供参考,所有数据框:
>>> print(df[["col1", "col2", "TrueFalse", "Suffix"]])
col1 col2 TrueFalse Suffix
0 Draft SW Quality Assurance Story aa False Story
1 alex ac bb False ac
2 anny ac cc False ac
3 antoine ac dd False ac
4 aze epic ee False epic
5 bella ac ff False ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
8 wqas epic ii False epic
9 karmen ac jj False ac
10 kameilia ac kk False ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn False ac
14 joseph ac oo False ac
这是您可以完成此操作的一种方法。
Pd.concat 并使用 .all 检查是否所有单词都存在。
检查相同的栏目是否以故事结尾。
检查下一列 (df.shift(-1)) 是否以 ac.
结尾
编辑:阅读一些评论后,您似乎还希望以 ac 结尾的下一行为 True。
我在最后添加了额外的代码来添加这个条件。
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=(pd.concat([df['col1'].str.contains(word) for word in set_words],axis=1).all(axis=1)) & (df['col1'].str.endswith('Story')) & (df['col1'].shift(-1).str.endswith('ac'))
##Make sure line ends with ac and prev line follows conditions
df['AC_COL'] = df['TrueFalse'].shift(1).fillna(False)
df['Final_TrueFalse'] = df['TrueFalse'] | df['AC_COL']
print(df[['col1','col2','Final_TrueFalse']])
col1 col2 Final_TrueFalse
0 Draft SW Quality Assurance Story aa False
1 alex ac bb False
2 anny ac cc False
3 antoine ac dd False
4 aze epic ee False
5 bella ac ff False
6 Complete SW Quality Assurance Plan Story gg True
7 celine ac hh True
8 wqas epic ii False
9 karmen ac jj False
10 kameilia ac kk False
11 Update SW Quality Assurance Plan Story ll True
12 joseph ac mm True
13 Update SW Quality Assurance Plan ac nn False
14 joseph ac oo False
如果应用了某些条件,我需要提取行。
- 列
col1
应包含列表list_words
中的所有单词。 - 最后一个词应该是
Story
- 下一行的最后一个字应该是b
ac
:
这是我当前的代码:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=pd.concat([df.col1.str.contains(word,regex=False) for word in list_words.split()],axis=1).sum(1) > 1
print('\n',df)
#extract last word
df["Suffix"] = df["col1"].str.split().str[-1]
print('\n',df)
df['ok']=''
for i in range (len(df)-1):
if ((df["Suffix"].iloc[i]=='Story') & (df["TrueFalse"].iloc[i]=='True') & (df["Suffix"].iloc[i+1]=='ac')):
df['ok'].iloc[i+1]=df["Suffix"].iloc[i+1]
print('\n',df)
输出:
col1 col2 TrueFalse Suffix ok
0 Draft SW Quality Assurance Story aa True Story
1 alex ac bb False ac
2 anny ac cc False ac
3 antoine ac dd False ac
4 aze epic ee False epic
5 bella ac ff False ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh False ac
8 wqas epic ii False epic
9 karmen ac jj False ac
10 kameilia ac kk False ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm False ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo False ac
第 13 行 应设置为 False
期望的输出:
col1 col2 TrueFalse Suffix
1 Complete SW Quality Assurance Plan Story gg True Story
2 celine ac hh True ac
3 Update SW Quality Assurance Plan Story ll True Story
4 joseph ac mm True ac
这是您的所有不同条件及其交集:
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
# Back to your notation:
# TrueFalse: rows that fulfill all three conditions along with their next rows
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
现在你想要的输出:
>>> print(df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse])
col1 col2 TrueFalse Suffix
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
仅供参考,所有数据框:
>>> print(df[["col1", "col2", "TrueFalse", "Suffix"]])
col1 col2 TrueFalse Suffix
0 Draft SW Quality Assurance Story aa False Story
1 alex ac bb False ac
2 anny ac cc False ac
3 antoine ac dd False ac
4 aze epic ee False epic
5 bella ac ff False ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
8 wqas epic ii False epic
9 karmen ac jj False ac
10 kameilia ac kk False ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn False ac
14 joseph ac oo False ac
这是您可以完成此操作的一种方法。
Pd.concat 并使用 .all 检查是否所有单词都存在。
检查相同的栏目是否以故事结尾。
检查下一列 (df.shift(-1)) 是否以 ac.
结尾编辑:阅读一些评论后,您似乎还希望以 ac 结尾的下一行为 True。
我在最后添加了额外的代码来添加这个条件。
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=(pd.concat([df['col1'].str.contains(word) for word in set_words],axis=1).all(axis=1)) & (df['col1'].str.endswith('Story')) & (df['col1'].shift(-1).str.endswith('ac'))
##Make sure line ends with ac and prev line follows conditions
df['AC_COL'] = df['TrueFalse'].shift(1).fillna(False)
df['Final_TrueFalse'] = df['TrueFalse'] | df['AC_COL']
print(df[['col1','col2','Final_TrueFalse']])
col1 col2 Final_TrueFalse
0 Draft SW Quality Assurance Story aa False
1 alex ac bb False
2 anny ac cc False
3 antoine ac dd False
4 aze epic ee False
5 bella ac ff False
6 Complete SW Quality Assurance Plan Story gg True
7 celine ac hh True
8 wqas epic ii False
9 karmen ac jj False
10 kameilia ac kk False
11 Update SW Quality Assurance Plan Story ll True
12 joseph ac mm True
13 Update SW Quality Assurance Plan ac nn False
14 joseph ac oo False