用户按项目矩阵 pandas
user by item martrix pandas
我在推荐系统工作。我已经按照 this 按项目矩阵创建用户。但是,我遇到了一个错误 IndexError: index 8928358160 is out of bounds for axis 0 with size 5
以下是数据集示例。
import pandas as pd
import numpy as np
df = pd.read_csv('APRIL.csv')
df = df.drop(['BASKETID'],1)
df = df.head(10)
df
Out[89]:
MEMBERID SKU QTY
0 8928358161 37101163 2
1 8928358161 36618858 1
2 8928358161 40855129 1
3 8933444371 35010078 1
4 8932505053 36335949 1
5 8932505053 92100668 1
6 8932505053 36529730 2
7 8921161362 61814893 1
8 8915688100 34732853 1
9 8915688100 35122457 1
n_users = df.MEMBERID.unique().shape[0]
n_items = df.SKU.unique().shape[0]
print str(n_users) + ' users'
print str(n_items) + ' items'
5 users
10 items
ratings = np.zeros((n_users, n_items))
for row in df.itertuples():
ratings[row[1]-1, row[2]-1] = row[3]
ratings
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-92-0a393963bf4c> in <module>()
1 ratings = np.zeros((n_users, n_items))
2 for row in df.itertuples():
----> 3 ratings[row[1]-1, row[2]-1] = row[3]
4 ratings
IndexError: index 8928358160 is out of bounds for axis 0 with size 5
我还是不明白index 8928358160
从哪里来。
为什么不将值转换为字符串?
尽管它是整数,但计算机可能会将其视为科学值,从而成为浮点值。
试试这个:
将 cust_id 和 item_number 从浮点值转换为字符:
mergedfinal['cust_id'] = mergedfinal['cust_id'].astype(str)
mergedfinal['item_number'] = mergedfinal['item_number'].astype(str)
mergedfinal['SKU'] = mergedfinal['SKU'].astype(str)
mergedfinal 是我的数据框
我在推荐系统工作。我已经按照 this 按项目矩阵创建用户。但是,我遇到了一个错误 IndexError: index 8928358160 is out of bounds for axis 0 with size 5
以下是数据集示例。
import pandas as pd
import numpy as np
df = pd.read_csv('APRIL.csv')
df = df.drop(['BASKETID'],1)
df = df.head(10)
df
Out[89]:
MEMBERID SKU QTY
0 8928358161 37101163 2
1 8928358161 36618858 1
2 8928358161 40855129 1
3 8933444371 35010078 1
4 8932505053 36335949 1
5 8932505053 92100668 1
6 8932505053 36529730 2
7 8921161362 61814893 1
8 8915688100 34732853 1
9 8915688100 35122457 1
n_users = df.MEMBERID.unique().shape[0]
n_items = df.SKU.unique().shape[0]
print str(n_users) + ' users'
print str(n_items) + ' items'
5 users
10 items
ratings = np.zeros((n_users, n_items))
for row in df.itertuples():
ratings[row[1]-1, row[2]-1] = row[3]
ratings
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-92-0a393963bf4c> in <module>()
1 ratings = np.zeros((n_users, n_items))
2 for row in df.itertuples():
----> 3 ratings[row[1]-1, row[2]-1] = row[3]
4 ratings
IndexError: index 8928358160 is out of bounds for axis 0 with size 5
我还是不明白index 8928358160
从哪里来。
为什么不将值转换为字符串? 尽管它是整数,但计算机可能会将其视为科学值,从而成为浮点值。
试试这个:
将 cust_id 和 item_number 从浮点值转换为字符:
mergedfinal['cust_id'] = mergedfinal['cust_id'].astype(str)
mergedfinal['item_number'] = mergedfinal['item_number'].astype(str)
mergedfinal['SKU'] = mergedfinal['SKU'].astype(str)
mergedfinal 是我的数据框