在 Python 中,我正在比较包含字符串的数据帧,以确定它是应该通过还是失败。当数据应该失败时,如何阻止数据通过?
In Python, I am comparing dataframes containing strings to decide if it should pass or fail. How can I stop data from passing when it should fail?
我有 20 多个测试用例,用于检查 CSV 是否存在由于数据输入导致的数据异常。此测试用例 (#15) 将称呼和收件人与婚姻状况进行比较。
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
df = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15 = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'
如果 MrtlStat 包含 Widow、Divorced 或 Single 而 PrimAddText 或 PrimSalTexts 包含 AND 或 &,则它应该无法通过测试。此测试仅在 PrimSalTexts 和 PrimAddText 都包含 AND 或 & 时有效。
Table 显示通过但应该失败的数据:
PrimAddText
PrimSalText
MrtlStat
Mrs. Judith Elfrank
Mr. & Mrs. Elfrank & Michael
Widowed
Mr. & Mrs.Karl Magnusen
Mr. Magnusen
Widowed
Table 显示数据按预期失败:
PrimAddText
PrimSalText
MrtlStat
Mr. & Mrs. Elfrank
Mr. & Mrs. Elfrank & Michael
Widowed
如果只有一列(PrimSalTexts 或 PrimAddText)包含 AND 或 &,我如何调整测试以使其工作?
您不应按顺序过滤数据,而应将条件合并为一个条件(使用 & 和 |)。一个好的方法是 numpy.where:
import pandas as pd
import numpy as np
# construct data
data = pd.DataFrame({
'PrimAddText': ['Mrs. Judith Elfrank', 'Mr. & Mrs.Karl Magnusen', 'Mr. & Mrs. Elfrank'],
'PrimSalText': ['Mr. & Mrs. Elfrank & Michael', 'Mr. Magnusen', 'Mr. & Mrs. Elfrank & Michael'],
'MrtlStat': ['Widowed', 'Widowed', 'Widowed']
})
# Case 15 - create condition
data['Status_case15'] = np.where((data['MrtlStat'].str.contains("Widow|Divorced|Single")
& (data['PrimAddText'].str.contains("AND|&", na=False)
| data['PrimSalText'].str.contains("AND|&", na=False))), False, True)
# filter failing rows
data.loc[data['Status_case15'] == False]
# sum correct rows
sum(data['Status_case15'])
您有一个 AND 条件 b/w 第二个和第三个条件,您可以将它们分开并从每个条件中捕获结果。最后将两个列表合并在一起
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
data_15_A = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15_B = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15_A.index.tolist() + data_15_B.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'
我有 20 多个测试用例,用于检查 CSV 是否存在由于数据输入导致的数据异常。此测试用例 (#15) 将称呼和收件人与婚姻状况进行比较。
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
df = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15 = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'
如果 MrtlStat 包含 Widow、Divorced 或 Single 而 PrimAddText 或 PrimSalTexts 包含 AND 或 &,则它应该无法通过测试。此测试仅在 PrimSalTexts 和 PrimAddText 都包含 AND 或 & 时有效。
Table 显示通过但应该失败的数据:
PrimAddText | PrimSalText | MrtlStat |
---|---|---|
Mrs. Judith Elfrank | Mr. & Mrs. Elfrank & Michael | Widowed |
Mr. & Mrs.Karl Magnusen | Mr. Magnusen | Widowed |
Table 显示数据按预期失败:
PrimAddText | PrimSalText | MrtlStat |
---|---|---|
Mr. & Mrs. Elfrank | Mr. & Mrs. Elfrank & Michael | Widowed |
如果只有一列(PrimSalTexts 或 PrimAddText)包含 AND 或 &,我如何调整测试以使其工作?
您不应按顺序过滤数据,而应将条件合并为一个条件(使用 & 和 |)。一个好的方法是 numpy.where:
import pandas as pd
import numpy as np
# construct data
data = pd.DataFrame({
'PrimAddText': ['Mrs. Judith Elfrank', 'Mr. & Mrs.Karl Magnusen', 'Mr. & Mrs. Elfrank'],
'PrimSalText': ['Mr. & Mrs. Elfrank & Michael', 'Mr. Magnusen', 'Mr. & Mrs. Elfrank & Michael'],
'MrtlStat': ['Widowed', 'Widowed', 'Widowed']
})
# Case 15 - create condition
data['Status_case15'] = np.where((data['MrtlStat'].str.contains("Widow|Divorced|Single")
& (data['PrimAddText'].str.contains("AND|&", na=False)
| data['PrimSalText'].str.contains("AND|&", na=False))), False, True)
# filter failing rows
data.loc[data['Status_case15'] == False]
# sum correct rows
sum(data['Status_case15'])
您有一个 AND 条件 b/w 第二个和第三个条件,您可以将它们分开并从每个条件中捕获结果。最后将两个列表合并在一起
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
data_15_A = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15_B = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15_A.index.tolist() + data_15_B.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'