python 函数的默认参数并不总是有效

Question

我正在阅读 Programming Collective Intelligence 并以比书中所写的更 pythonic 的方式编写一些代码，只是为了学习。

第一章是关于推荐系统的。在下一个字典的基础上，提出了一些相似性度量。

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane':
3.5,
        'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
        'The Night Listener': 3.0},
    'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
        'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
        'You, Me and Dupree': 3.5},
    'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
        'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
        'The Night Listener': 4.5, 'Superman Returns': 4.0,
        'You, Me and Dupree': 2.5},
    'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
        'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
        'You, Me and Dupree': 2.0},
    'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
        'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

鉴于 unique_pairs 是包含不同可能的人对的元组列表，

unique_pairs = list(itertools.combinations(people, 2))

unique_pairs
[('Michael Phillips', 'Mick LaSalle'),
 ('Michael Phillips', 'Lisa Rose'),
 ('Michael Phillips', 'Toby'),
 ('Michael Phillips', 'Jack Matthews'),
 ('Michael Phillips', 'Gene Seymour'),
 ('Michael Phillips', 'Claudia Puig'),
 ('Mick LaSalle', 'Lisa Rose'),
 ('Mick LaSalle', 'Toby'),
 ('Mick LaSalle', 'Jack Matthews'),
 ('Mick LaSalle', 'Gene Seymour'),
 ('Mick LaSalle', 'Claudia Puig'),
 ('Lisa Rose', 'Toby'),
 ('Lisa Rose', 'Jack Matthews'),
 ('Lisa Rose', 'Gene Seymour'),
 ('Lisa Rose', 'Claudia Puig'),
 ('Toby', 'Jack Matthews'),
 ('Toby', 'Gene Seymour'),
 ('Toby', 'Claudia Puig'),
 ('Jack Matthews', 'Gene Seymour'),
 ('Jack Matthews', 'Claudia Puig'),
 ('Gene Seymour', 'Claudia Puig')]

我试图通过向函数的结果添加一个 p 值来改进书中建议的 Pearson Correlation 相似度函数，仅当函数的参数 p_value 为真时才输出。函数是这样定义的：

def sim_pearson(prefs, p1, p2, p_value=False):
    """Returns the pearson correlation coefficient and the p-value (optional)
    of the ratings of the movies that both p1 and p2 have rated"""

    # Creates a list with the movies that both p1 and p2 have rated
    movies = [movie for movie in prefs[p1] if movie in prefs[p2]]

    # List of the scores that both p1 and p2 have given to the movies in common
    scores_p1 = [prefs[p1][movie] for movie in movies]
    scores_p2 = [prefs[p2][movie] for movie in movies]

    corr, p_value = scipy.stats.pearsonr(scores_p1, scores_p2)

    if p_value:
        return (corr, p_value)
    else:
        return corr

我的问题是该函数没有按预期工作，因为当 p 值为 True 时，它并不总是 returns (correlation coefficient, p-value) 的元组，并且当 p_value 为 True 和为 False 时，它产生相同的结果。为什么会发生这种情况，我该如何解决？

这是一个列表，其中包含将函数应用于每个可能的人对的结果，看看我说的是什么。结果与 p_value=True 和 p_value=False 相同，我将只粘贴前一种情况。

pearson_results = [(pair[0][:5], 
                    pair[1][:5], 
                    sim_pearson(critics, pair[0], pair[1], p_value=True)) 
                    for pair in unique_pairs]

pearson_results
[('Micha', 'Mick ', (-0.2581988897471611, 0.74180111025283857)),
 ('Micha', 'Lisa ', (0.40451991747794525, 0.59548008252205464)),
 ('Micha', 'Toby', -1.0),
 ('Micha', 'Jack ', (0.13483997249264842, 0.8651600275073511)),
 ('Micha', 'Gene ', (0.20459830184114206, 0.79540169815885797)),
 ('Micha', 'Claud', 1.0),
 ('Mick ', 'Lisa ', (0.59408852578600457, 0.21370636293028805)),
 ('Mick ', 'Toby', (0.92447345164190498, 0.24901011701138964)),
 ('Mick ', 'Jack ', (0.21128856368212914, 0.73299431171284912)),
 ('Mick ', 'Gene ', (0.41176470588235292, 0.41726032973743138)),
 ('Mick ', 'Claud', (0.56694670951384085, 0.3189317919127756)),
 ('Lisa ', 'Toby', (0.99124070716193036, 0.084323216321943714)),
 ('Lisa ', 'Jack ', (0.74701788083399601, 0.14681146067336839)),
 ('Lisa ', 'Gene ', (0.39605901719066977, 0.43697492654267506)),
 ('Lisa ', 'Claud', (0.56694670951384085, 0.3189317919127756)),
 ('Toby', 'Jack ', (0.66284898035987017, 0.53869426797895403)),
 ('Toby', 'Gene ', (0.38124642583151169, 0.75098988298861025)),
 ('Toby', 'Claud', (0.89340514744156441, 0.29661883133160016)),
 ('Jack ', 'Gene ', (0.96379568187563314, 0.0082243534847899202)),
 ('Jack ', 'Claud', (0.028571428571428571, 0.9714285714285712)),
 ('Gene ', 'Claud', (0.31497039417435602, 0.60570041941160946))]

Answer 1

将函数的底部更改为：

corr, p_value2 = scipy.stats.pearsonr(scores_p1, scores_p2)

if p_value:
    return (corr, p_value2)
else:
    return corr

python 函数的默认参数并不总是有效

Default parameter on python function not always working

python

data-analysis

pearson

python-3.x