使用 numpy 或 tensorflow 的 SVD++ 矢量化
SVD++ vectorization with numpy or tensorflow
我想用 numpy 或 tensorflow 实现 SVD++。
( https://pdfs.semanticscholar.org/8451/c2812a1476d3e13f2a509139322cc0adb1a2.pdf )
(4p 等式 4)
我想在没有任何 for 循环的情况下实现上面的等式。
但是 y_j 与索引集 R(u) 的求和使其变得困难。
所以我的问题是...
我想实现下面的等式(q_v 乘以 y_j 的总和)没有任何 for 循环
1. 没有for循环的numpy可以实现吗?!
2.不用for循环用tensorflow实现可以吗?!
我的实现如下...但我想进一步删除 for 循环
import numpy as np
num_users = 3
num_items = 5
latent_dim = 2
p = 0.1
r = np.random.binomial(1, 1 - p,(num_users, num_items))
r_hat = np.zeros([num_users,num_items])
q = np.random.randn(latent_dim,num_items)
y = np.random.randn(latent_dim,num_items)
## First Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = 0 # to make sum of y_i
for user_item in user_item_list:
sum_y_j = sum_y_j + y[:,user_item]
sum_y_j = np.asarray(sum_y_j)
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Second Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Third Try
for user in range(num_users):
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,:] = np.dot(np.transpose(q),sum_y_j)
print r_hat
试试这个。
sum_y = []
for user in range(num_users):
mask = np.repeat(r[user,:][None,:],latent_dim, axis=0)
sum_y.append(np.sum(np.multiply(y, mask),axis=1))
sum_y = np.asarray(sum_y)
r_hat = (np.dot(q.T,sum_y.T)).T
print r_hat
它消除了枚举循环,并且点积也可以在单行中完成。我认为不能再减少了。
只需使用两个 matrix-multiplications 和 np.dot
作为最终输出 -
r_hat = r.dot(y.T).dot(q)
样本运行验证结果-
OP 的示例设置:
In [68]: import numpy as np
...:
...: num_users = 3
...: num_items = 5
...: latent_dim = 2
...: p = 0.1
...:
...: r = np.random.binomial(1, 1 - p,(num_users, num_items))
...: r_hat = np.zeros([num_users,num_items])
...:
...: q = np.random.randn(latent_dim,num_items)
...: y = np.random.randn(latent_dim,num_items)
...:
In [69]: ## Second Try from OP
...: for user in range(num_users):
...: for item in range(num_items):
...: q_j = q[:,item]
...: user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
...: sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
...: r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
...:
让我们打印出 OP 解决方案的结果 -
In [70]: r_hat
Out[70]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
现在,我正在使用我提出的解决方案 -
In [71]: r.dot(y.T).dot(q)
Out[71]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
值检查似乎成功了!
我想用 numpy 或 tensorflow 实现 SVD++。
( https://pdfs.semanticscholar.org/8451/c2812a1476d3e13f2a509139322cc0adb1a2.pdf )
(4p 等式 4)
但是 y_j 与索引集 R(u) 的求和使其变得困难。
所以我的问题是...
我想实现下面的等式(q_v 乘以 y_j 的总和)没有任何 for 循环
1. 没有for循环的numpy可以实现吗?!
2.不用for循环用tensorflow实现可以吗?!
我的实现如下...但我想进一步删除 for 循环
import numpy as np
num_users = 3
num_items = 5
latent_dim = 2
p = 0.1
r = np.random.binomial(1, 1 - p,(num_users, num_items))
r_hat = np.zeros([num_users,num_items])
q = np.random.randn(latent_dim,num_items)
y = np.random.randn(latent_dim,num_items)
## First Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = 0 # to make sum of y_i
for user_item in user_item_list:
sum_y_j = sum_y_j + y[:,user_item]
sum_y_j = np.asarray(sum_y_j)
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Second Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Third Try
for user in range(num_users):
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,:] = np.dot(np.transpose(q),sum_y_j)
print r_hat
试试这个。
sum_y = []
for user in range(num_users):
mask = np.repeat(r[user,:][None,:],latent_dim, axis=0)
sum_y.append(np.sum(np.multiply(y, mask),axis=1))
sum_y = np.asarray(sum_y)
r_hat = (np.dot(q.T,sum_y.T)).T
print r_hat
它消除了枚举循环,并且点积也可以在单行中完成。我认为不能再减少了。
只需使用两个 matrix-multiplications 和 np.dot
作为最终输出 -
r_hat = r.dot(y.T).dot(q)
样本运行验证结果-
OP 的示例设置:
In [68]: import numpy as np
...:
...: num_users = 3
...: num_items = 5
...: latent_dim = 2
...: p = 0.1
...:
...: r = np.random.binomial(1, 1 - p,(num_users, num_items))
...: r_hat = np.zeros([num_users,num_items])
...:
...: q = np.random.randn(latent_dim,num_items)
...: y = np.random.randn(latent_dim,num_items)
...:
In [69]: ## Second Try from OP
...: for user in range(num_users):
...: for item in range(num_items):
...: q_j = q[:,item]
...: user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
...: sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
...: r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
...:
让我们打印出 OP 解决方案的结果 -
In [70]: r_hat
Out[70]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
现在,我正在使用我提出的解决方案 -
In [71]: r.dot(y.T).dot(q)
Out[71]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
值检查似乎成功了!