使用 scipy.sparse.linalg 执行奇异值分解时形状未对齐错误
shapes not aligned error when performing Singular Value Decomposition using scipy.sparse.linalg
我正在尝试使用奇异值分解 (SVD) 来预测稀疏矩阵中的缺失值。 “在 Python 中构建推荐引擎”Datacamp 课程的第 4 章提供了一个关于电影评级的示例,非常棒。作为第一步,我一直在尝试使用 Jupyter Notebook 在我的本地 PC 上复制这个 Datacamp 示例。但是,当我尝试将“svds”函数输出的 U_Sigma 和 Vt 矩阵相乘时,出现错误:
ValueError: shapes (671,) and (6,9161) not aligned: 671 (dim 0) != 6 (dim 0)
我正在使用这个数据集:https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7?select=ratings_small.csv
这是我正在尝试的代码 运行:
import pandas as pd
filename = 'ratings_small.csv'
df = pd.read_csv(filename)
df.head()
user_ratings_df = df.pivot(index='userId', columns='movieId', values='rating')
# Get the average rating for each user
avg_ratings = user_ratings_df.mean(axis=1)
# Center each user's ratings around 0
user_ratings_centered = user_ratings_df.sub(avg_ratings, axis=1)
# Fill in all missing values with 0s
user_ratings_centered.fillna(0, inplace=True)
# Print the mean of each column
print(user_ratings_centered.mean(axis=1))
######################
# Import the required libraries
from scipy.sparse.linalg import svds
import numpy as np
# Decompose the matrix
U, sigma, Vt = svds(user_ratings_centered)
## Now that you have your three factor matrices, you can multiply them back together to get complete ratings data
# without missing values. In this exercise, you will use numpy's dot product function to multiply U and sigma first,
# then the result by Vt. You will then be able add the average ratings for each row to find your final ratings.
# Dot product of U and sigma
U_sigma = np.dot(U, sigma)
# Dot product of result and Vt
U_sigma_Vt = np.dot(U_sigma, Vt)
缺少一行代码。 运行 "svds" 分解矩阵后,我们需要这一行:
# Convert sigma into a diagonal matrix
sigma = np.diag(sigma)
我正在尝试使用奇异值分解 (SVD) 来预测稀疏矩阵中的缺失值。 “在 Python 中构建推荐引擎”Datacamp 课程的第 4 章提供了一个关于电影评级的示例,非常棒。作为第一步,我一直在尝试使用 Jupyter Notebook 在我的本地 PC 上复制这个 Datacamp 示例。但是,当我尝试将“svds”函数输出的 U_Sigma 和 Vt 矩阵相乘时,出现错误:
ValueError: shapes (671,) and (6,9161) not aligned: 671 (dim 0) != 6 (dim 0)
我正在使用这个数据集:https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7?select=ratings_small.csv
这是我正在尝试的代码 运行:
import pandas as pd filename = 'ratings_small.csv' df = pd.read_csv(filename) df.head() user_ratings_df = df.pivot(index='userId', columns='movieId', values='rating') # Get the average rating for each user avg_ratings = user_ratings_df.mean(axis=1) # Center each user's ratings around 0 user_ratings_centered = user_ratings_df.sub(avg_ratings, axis=1) # Fill in all missing values with 0s user_ratings_centered.fillna(0, inplace=True) # Print the mean of each column print(user_ratings_centered.mean(axis=1)) ###################### # Import the required libraries from scipy.sparse.linalg import svds import numpy as np # Decompose the matrix U, sigma, Vt = svds(user_ratings_centered) ## Now that you have your three factor matrices, you can multiply them back together to get complete ratings data # without missing values. In this exercise, you will use numpy's dot product function to multiply U and sigma first, # then the result by Vt. You will then be able add the average ratings for each row to find your final ratings. # Dot product of U and sigma U_sigma = np.dot(U, sigma) # Dot product of result and Vt U_sigma_Vt = np.dot(U_sigma, Vt)
缺少一行代码。 运行 "svds" 分解矩阵后,我们需要这一行:
# Convert sigma into a diagonal matrix
sigma = np.diag(sigma)