AWS 个性化:过滤所有已交互的项目似乎不会持续存在

AWS personalize: filtering all items already interacted with does not seem to persist

我们正在使用 AWS Personalize 为特定用户获取我们 Feed 中各种项目的个性化排名。

我们也在使用看起来像

的过滤器
EXCLUDE ItemID WHERE Interactions.event_type IN ("*")

此过滤器取自AWS blog,其中指出

To remove all items that a user has previously interacted with, use the following filter expression:

EXCLUDE itemId WHERE INTERACTIONS.event_type in ("*")

正在玩控制台 https://console.aws.amazon.com/personalize/home?region=us-east-1#arn:aws:personalize:us-east-1::dataset-group$/campaigns/campaignDetail/

我输入了一个userId=5253ffbb-f5e3-4e71-9a33-91ee65365c7d和一堆item ids:

5829, 5480, 2275, 6706, 5438, 6444, 6444, 7461, 7599, 4384, 6747, 7499, 6491, 5453, 7605, 5985, 6663, 7174, 1094, 6474, 7357, 7220, 8370, 7445, 5721, 991, 5592, 9283, 7547, 8676, 8872, 8092, 9401, 8645, 2090, 7684, 3788, 5849, 6524, 8480, 7299, 5752, 8007, 9100, 7422, 8640, 7917, 9254, 10050, 9851, 1744, 4227, 6388, 9490, 6481, 5744, 6486, 9040, 4048, 8170, 9623, 7966, 8560, 5336, 3885, 4441, 10442, 6842, 4898, 567, 4214, 125, 9556, 10039, 5494, 9447, 10051, 8302, 9482, 6649, 9133, 4828, 8288, 62, 9680, 4792, 10785, 9727, 10777, 11366, 10252, 9728, 2450, 10463, 9578, 4246, 10154, 10793, 10299, 6733, 10597, vy7erddv, 9247, 9816, 8385, 9589, 10845, 10368, 11427, 11405, 10475, 11273, 11392, 11335, 5871, 10465, 10927, 9371, 9894, 10773, 10747, 11274, 11349, 10831, 9882, vaxq362m, m3g32ayv, 5wqa8r4v, km7kl7kv, 3wno92pm, 3m483l5v, pv9rallv, lmr4dn8v

现在我记录此用户与某些项目的交互并重新加载控制台建议...

似乎 可以按预期工作,如果用户已经与这些项目进行交互,则会从列表中过滤掉这些项目。

但令我惊讶的是......这些项目不会无限期地保持过滤......如果我继续记录与该用户的其他项目的交互,那么稍后重新加载的推荐可能会包含以前交互过的项目。或者如果有足够的时间(比如一天),所有的项目似乎都会为这个用户回来!!

我完全不知道为什么会这样。

互动被跟踪为

POST https://personalize-events.us-east-1.amazonaws.com/events
{
   "eventList": [ 
      { 
         "eventType": "list_view",
         "ITEM_ID": "vaxq362m",
         "properties": "{\"itemType\": \"artwork\", \"itemId\": \"vaxq362m\"}",
         "sentAt": {{$timestamp}}
      }
   ],
   "sessionId": "xxx1234",
   "trackingId": "<OUR_TRACKING_ID>",
   "userId": "5253ffbb-f5e3-4e71-9a33-91ee65365c7d"
}

这似乎有效,因为

  1. 响应状态为200
  2. 如果我导出交互数据集,交互将显示在 CSV 中
  3. 项目确实会在短时间内从返回的推荐中删除

过滤交互数据集不会考虑用户的完整历史记录。来自 docs:

Amazon Personalize considers up to 200 historical interactions for a user, and up to 100 streamed interactions you record for the user with the PutEvents operation. Additionally, the number of historical interactions Amazon Personalize considers for a user depends on the max_user_history_length_percentile and min_user_history_length_percentile hyperparameters you defined before training.

For example, if you used .99 for the max_user_history_length_percentile, and 99% of your users have at most 4 interactions, Amazon Personalize will only filter based on the user's most recent 4 historical interactions. If a user has less than the number historical interactions at the min_user_history_length_percentile, Amazon Personalize doesn't consider the user's interactions when filtering.

To filter based on up to 200 historical interactions for a user, set the max_user_history_length_percentile to 1.0 and retrain the model.