为什么在嵌入之前，必须使项目从零开始顺序

Question

我从这个博格那里学习了协同过滤，Deep Learning With Keras: Recommender Systems。

教程很好，代码运行良好。 Here is my code.

有一件事让我很困惑，作者说，

The user/movie fields are currently non-sequential integers representing some unique ID for that entity. We need them to be sequential starting at zero to use for modeling (you'll see why later).

user_enc = LabelEncoder()
ratings['user'] = user_enc.fit_transform(ratings['userId'].values)
n_users = ratings['user'].nunique()

不过他好像没说原因，我也不知道为什么要that.Can有人给我解释一下？

Answer 1

假定嵌入是连续的。

Embedding的第一个输入是输入维度。因此，如果输入超过输入维度，则该值将被忽略。 Embedding 假设输入中的最大值是输入维度 -1（从 0 开始）。

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding?hl=ja

例如，以下代码将仅为输入 [4,3] 生成嵌入，并将跳过输入 [7, 8]，因为输入维度为 5。

我觉得用tensorflow解释更清楚；

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding

model = Sequential()
model.add(Embedding(5, 1, input_length=2))
input_array = np.array([[4,3], [7,8]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)

您可以将输入维度增加到 9，然后您将获得两个输入的嵌入。

您可以将原始数据集中的输入维度增加到 max number + 1，但这效率不高。它实际上类似于单热编码，其中顺序数据可以节省大量内存。

为什么在嵌入之前，必须使项目从零开始顺序

why before embedding, have to make the item be sequential starting at zero

embedding

collaborative-filtering

neural-network

python-3.x

tensorflow