推荐系统中的矩阵分解方法
Matrix factorization methods in recommendation systems
我阅读了一些关于推荐系统中矩阵分解方法的资料,发现了这个非常好的教程:http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/
一切都很好,但是这一段让我很感兴趣:
A question might have come to your mind by now: if we find two matrices P and Q such that PXQ approximates R, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with P and Q such that we can reproduce R exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. In other words, if we let T be a set of tuples, each of which is in the form of (u_i, d_j, r_ij), such that T contains all the observed user-item pairs together with the associated ratings, we are only trying to minimise every e_ij for (u_i, d_j, r_ij) in T. (In other words, T is our set of training data.) As for the rest of the unknowns, we will be able to determine their values once the associations between the users, items and features have been learnt.
我想知道是否有人可以帮助我解决这个问题?潜在因素是否有助于我们了解每个用户和项目的行为?
谢谢
潜在因素是描述用户和项目的两组值(一组用于用户,一组用于项目)。本质上,您要做的是找到您的项目和用户的数字表示。
假设您有电影评级系统,有 3 个用户因素和 3 个电影(项目)因素。用户项目可以是你有多喜欢喜剧、戏剧或动作片,而电影因素是它有多像喜剧、戏剧或动作片。从这些属性中,您可以估计其他对的评级。该模型为您找到这些抽象因素。
这意味着您只能为您有评级的项目和用户找到合理的表示。
所以当你训练你的模型时,你使用已知的评级来估计这个表示。您可以从中尝试预测用户和项目的未知评级。
我阅读了一些关于推荐系统中矩阵分解方法的资料,发现了这个非常好的教程:http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/
一切都很好,但是这一段让我很感兴趣:
A question might have come to your mind by now: if we find two matrices P and Q such that PXQ approximates R, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with P and Q such that we can reproduce R exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. In other words, if we let T be a set of tuples, each of which is in the form of (u_i, d_j, r_ij), such that T contains all the observed user-item pairs together with the associated ratings, we are only trying to minimise every e_ij for (u_i, d_j, r_ij) in T. (In other words, T is our set of training data.) As for the rest of the unknowns, we will be able to determine their values once the associations between the users, items and features have been learnt.
我想知道是否有人可以帮助我解决这个问题?潜在因素是否有助于我们了解每个用户和项目的行为?
谢谢
潜在因素是描述用户和项目的两组值(一组用于用户,一组用于项目)。本质上,您要做的是找到您的项目和用户的数字表示。
假设您有电影评级系统,有 3 个用户因素和 3 个电影(项目)因素。用户项目可以是你有多喜欢喜剧、戏剧或动作片,而电影因素是它有多像喜剧、戏剧或动作片。从这些属性中,您可以估计其他对的评级。该模型为您找到这些抽象因素。
这意味着您只能为您有评级的项目和用户找到合理的表示。 所以当你训练你的模型时,你使用已知的评级来估计这个表示。您可以从中尝试预测用户和项目的未知评级。