用户按项目矩阵 pandas

user by item martrix pandas

我在推荐系统工作。我已经按照 this 按项目矩阵创建用户。但是,我遇到了一个错误 IndexError: index 8928358160 is out of bounds for axis 0 with size 5

以下是数据集示例。

import pandas as pd
import numpy as np

df = pd.read_csv('APRIL.csv')
df = df.drop(['BASKETID'],1)
df = df.head(10)
df
Out[89]:
MEMBERID    SKU QTY
0   8928358161  37101163    2
1   8928358161  36618858    1
2   8928358161  40855129    1
3   8933444371  35010078    1
4   8932505053  36335949    1
5   8932505053  92100668    1
6   8932505053  36529730    2
7   8921161362  61814893    1
8   8915688100  34732853    1
9   8915688100  35122457    1


n_users = df.MEMBERID.unique().shape[0]
n_items = df.SKU.unique().shape[0]
print str(n_users) + ' users'
print str(n_items) + ' items'
5 users
10 items

ratings = np.zeros((n_users, n_items))
for row in df.itertuples():
    ratings[row[1]-1, row[2]-1] = row[3]
ratings
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-92-0a393963bf4c> in <module>()
      1 ratings = np.zeros((n_users, n_items))
      2 for row in df.itertuples():
----> 3     ratings[row[1]-1, row[2]-1] = row[3]
      4 ratings

IndexError: index 8928358160 is out of bounds for axis 0 with size 5

我还是不明白index 8928358160从哪里来。

为什么不将值转换为字符串? 尽管它是整数,但计算机可能会将其视为科学值,从而成为浮点值。

试试这个:

将 cust_id 和 item_number 从浮点值转换为字符:

mergedfinal['cust_id'] = mergedfinal['cust_id'].astype(str)
mergedfinal['item_number'] = mergedfinal['item_number'].astype(str)
mergedfinal['SKU'] = mergedfinal['SKU'].astype(str)

mergedfinal 是我的数据框