如何将此代码重写为 apply-lambda 表达式?
How to rewrite this code into an apply-lambda expression?
我的数据框 (df) 在新列 's_score' 中有一些 NaN 条目,我可以使用 func(x) 将其排除。
即 document_path_similarity() 的执行会导致一些 NaN,从而阻止 most_similar_docs() 的执行(如果我不先使用 func(x))。
D1,D2 是 df.columns 字符串数据。
df
Quality D1 D2
0 1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive
1 1 After more than two years' det... After more than two years in
def most_similar_docs():
def func(x):
try:
return document_path_similarity(x['D1'], x['D2'])
except:
return np.nan
df['s_score'] = df.apply(func, axis=1)
有没有办法将这段代码重写为一行代码?
我的以下尝试导致“ValueError: ('max() arg is an empty sequence' or SyntaxError.
df['s_scores'] = df.apply(lambda x: document_path_similarity(x.D1, x.D2),axis=1)
paraphrases['s_scores'] = paraphrases.apply(lambda x: document_path_similarity(x.D1, x.D2),axis=1 if np.isnan(x))
我认为您的 pandas
代码没有任何问题。我确实发现 similarity_score()
失败了,因为它试图获取一个空列表的最大值。我通过强制输入零分来强制列表为 non-empty。这是我第一次查看这个库,所以请不要认为我的补丁质量很好。
import io
df = pd.read_csv(io.StringIO(""" Quality D1 D2
0 1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive
1 1 After more than two years' det... After more than two years in """), sep="\s\s+", engine="python")
def similarity_score(s1, s2):
list1 = []
for a in s1:
# patch +[0] at end so never finding max of empty list
list1.append(max([i.path_similarity(a) for i in s2 if i.path_similarity(a) is not None]+[0]))
output = sum(list1)/len(list1)
return output
df = df.assign(
s_scores=lambda x: x.apply(lambda r: document_path_similarity(r.D1, r.D2), axis=1)
)
print(df.to_string(index=False))
输出
Quality D1 D2 s_scores
1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive 0.838889
1 After more than two years' det... After more than two years in 0.912500
我的数据框 (df) 在新列 's_score' 中有一些 NaN 条目,我可以使用 func(x) 将其排除。 即 document_path_similarity() 的执行会导致一些 NaN,从而阻止 most_similar_docs() 的执行(如果我不先使用 func(x))。 D1,D2 是 df.columns 字符串数据。
df
Quality D1 D2
0 1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive
1 1 After more than two years' det... After more than two years in
def most_similar_docs():
def func(x):
try:
return document_path_similarity(x['D1'], x['D2'])
except:
return np.nan
df['s_score'] = df.apply(func, axis=1)
有没有办法将这段代码重写为一行代码?
我的以下尝试导致“ValueError: ('max() arg is an empty sequence' or SyntaxError.
df['s_scores'] = df.apply(lambda x: document_path_similarity(x.D1, x.D2),axis=1)
paraphrases['s_scores'] = paraphrases.apply(lambda x: document_path_similarity(x.D1, x.D2),axis=1 if np.isnan(x))
我认为您的 pandas
代码没有任何问题。我确实发现 similarity_score()
失败了,因为它试图获取一个空列表的最大值。我通过强制输入零分来强制列表为 non-empty。这是我第一次查看这个库,所以请不要认为我的补丁质量很好。
import io
df = pd.read_csv(io.StringIO(""" Quality D1 D2
0 1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive
1 1 After more than two years' det... After more than two years in """), sep="\s\s+", engine="python")
def similarity_score(s1, s2):
list1 = []
for a in s1:
# patch +[0] at end so never finding max of empty list
list1.append(max([i.path_similarity(a) for i in s2 if i.path_similarity(a) is not None]+[0]))
output = sum(list1)/len(list1)
return output
df = df.assign(
s_scores=lambda x: x.apply(lambda r: document_path_similarity(r.D1, r.D2), axis=1)
)
print(df.to_string(index=False))
输出
Quality D1 D2 s_scores
1 Ms Stewart, the chief executive... Ms Stewart, 61, its chief executive 0.838889
1 After more than two years' det... After more than two years in 0.912500