使用 scipy.sparse.linalg 执行奇异值分解时形状未对齐错误

shapes not aligned error when performing Singular Value Decomposition using scipy.sparse.linalg

我正在尝试使用奇异值分解 (SVD) 来预测稀疏矩阵中的缺失值。 “在 Python 中构建推荐引擎”Datacamp 课程的第 4 章提供了一个关于电影评级的示例,非常棒。作为第一步,我一直在尝试使用 Jupyter Notebook 在我的本地 PC 上复制这个 Datacamp 示例。但是,当我尝试将“svds”函数输出的 U_Sigma 和 Vt 矩阵相乘时,出现错误:

    ValueError: shapes (671,) and (6,9161) not aligned: 671 (dim 0) != 6 (dim 0)

我正在使用这个数据集:https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7?select=ratings_small.csv

这是我正在尝试的代码 运行:

    import pandas as pd
    
    filename = 'ratings_small.csv'
    df = pd.read_csv(filename)
    
    df.head()
    user_ratings_df = df.pivot(index='userId', columns='movieId', values='rating')
    
    # Get the average rating for each user 
    avg_ratings = user_ratings_df.mean(axis=1)
    
    # Center each user's ratings around 0
    user_ratings_centered = user_ratings_df.sub(avg_ratings, axis=1)
    
    # Fill in all missing values with 0s
    user_ratings_centered.fillna(0, inplace=True)
    # Print the mean of each column
    print(user_ratings_centered.mean(axis=1))
    
    ######################
    # Import the required libraries 
    from scipy.sparse.linalg import svds
    import numpy as np
    
    # Decompose the matrix
    U, sigma, Vt = svds(user_ratings_centered)
    
    ## Now that you have your three factor matrices, you can multiply them back together to get complete ratings data 
    # without missing values. In this exercise, you will use numpy's dot product function to multiply U and sigma first, 
    # then the result by Vt. You will then be able add the average ratings for each row to find your final ratings.
    
    # Dot product of U and sigma
    U_sigma = np.dot(U, sigma)
    
    # Dot product of result and Vt
    U_sigma_Vt = np.dot(U_sigma, Vt)

缺少一行代码。 运行 "svds" 分解矩阵后,我们需要这一行:

# Convert sigma into a diagonal matrix
sigma = np.diag(sigma)