graphlab 不推荐用户已经拥有的项目
graphlab don't recommend items already owned by user
在创建 graphlab 推荐模型时应该指定什么,以便不再向用户推荐已经拥有的项目?这可以通过指定某些参数直接完成,还是我需要从头开始编写推荐系统。?数据看起来像这样
| user_id | item_id | othercolumns |
|:-----------|------------:|:------------:|
| 1 | 21 | This |
| 2 | 22 | column |
| 1 | 23 | will |
| 3 | 24 | hold |
| 2 | 25 | other |
| 1 | 26 | values |
由于项目 21,23 和 26 已经被用户 1 拥有,因此不应向他推荐该项目。
此行为由 recommender.recommend
方法 (doc) 的 exclude_known
参数控制。
exclude_known : bool, optional
By default, all user-item interactions previously seen in the training
data, or in any new data provided using new_observation_data.., are
excluded from the recommendations. Passing in exclude_known = False
overrides this behavior.
例子
>>> import graphlab as gl
>>> sf = gl.SFrame({'user_id':[1,2,1,3,2,1], 'item_id':[21,22,23,24,25,26]})
>>> print sf
+---------+---------+
| item_id | user_id |
+---------+---------+
| 21 | 1 |
| 22 | 2 |
| 23 | 1 |
| 24 | 3 |
| 25 | 2 |
| 26 | 1 |
+---------+---------+
[6 rows x 2 columns]
>>> rec_model = gl.recommender.create(sf)
>>> # we recommend items not owned by user
>>> rec_wo_own_item = rec_model.recommend(sf['user_id'].unique())
>>> rec_wo_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 22 | 0.0 | 1 |
| 1 | 24 | 0.0 | 2 |
| 1 | 25 | 0.0 | 3 |
| 2 | 21 | 0.0 | 1 |
| 2 | 23 | 0.0 | 2 |
| 2 | 24 | 0.0 | 3 |
| 2 | 26 | 0.0 | 4 |
| 3 | 21 | 0.333333333333 | 1 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 25 | 0.166666666667 | 5 |
+---------+---------+----------------+------+
[12 rows x 4 columns]
>>> # we recommend items owned by user
>>> rec_w_own_item = rec_model.recommend(sf['user_id'].unique(), exclude_known=False)
>>> rec_w_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 21 | 0.666666666667 | 1 |
| 1 | 23 | 0.666666666667 | 2 |
| 1 | 26 | 0.666666666667 | 3 |
| 1 | 22 | 0.0 | 4 |
| 1 | 24 | 0.0 | 5 |
| 1 | 25 | 0.0 | 6 |
| 2 | 26 | 0.0 | 6 |
| 2 | 24 | 0.0 | 5 |
| 2 | 23 | 0.0 | 4 |
| 2 | 21 | 0.0 | 3 |
| 2 | 25 | 0.5 | 2 |
| 2 | 22 | 0.5 | 1 |
| 3 | 24 | 0.0 | 6 |
| 3 | 25 | 0.166666666667 | 5 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 21 | 0.333333333333 | 1 |
+---------+---------+----------------+------+
[18 rows x 4 columns]
>>> # we add recommended items not owned by user to the original SFrame
>>> rec = rec_wo_own_item.groupby('user_id', {'reco':gl.aggregate.CONCAT('item_id')})
>>> sf = sf.join(rec, 'user_id', 'left')
>>> print sf
+---------+---------+----------------------+
| item_id | user_id | reco |
+---------+---------+----------------------+
| 21 | 1 | [24, 25, 22] |
| 22 | 2 | [24, 26, 23, 21] |
| 23 | 1 | [24, 25, 22] |
| 24 | 3 | [21, 23, 26, 25, 22] |
| 25 | 2 | [24, 26, 23, 21] |
| 26 | 1 | [24, 25, 22] |
+---------+---------+----------------------+
[6 rows x 3 columns]
在创建 graphlab 推荐模型时应该指定什么,以便不再向用户推荐已经拥有的项目?这可以通过指定某些参数直接完成,还是我需要从头开始编写推荐系统。?数据看起来像这样
| user_id | item_id | othercolumns |
|:-----------|------------:|:------------:|
| 1 | 21 | This |
| 2 | 22 | column |
| 1 | 23 | will |
| 3 | 24 | hold |
| 2 | 25 | other |
| 1 | 26 | values |
由于项目 21,23 和 26 已经被用户 1 拥有,因此不应向他推荐该项目。
此行为由 recommender.recommend
方法 (doc) 的 exclude_known
参数控制。
exclude_known : bool, optional
By default, all user-item interactions previously seen in the training data, or in any new data provided using new_observation_data.., are excluded from the recommendations. Passing in exclude_known = False overrides this behavior.
例子
>>> import graphlab as gl
>>> sf = gl.SFrame({'user_id':[1,2,1,3,2,1], 'item_id':[21,22,23,24,25,26]})
>>> print sf
+---------+---------+
| item_id | user_id |
+---------+---------+
| 21 | 1 |
| 22 | 2 |
| 23 | 1 |
| 24 | 3 |
| 25 | 2 |
| 26 | 1 |
+---------+---------+
[6 rows x 2 columns]
>>> rec_model = gl.recommender.create(sf)
>>> # we recommend items not owned by user
>>> rec_wo_own_item = rec_model.recommend(sf['user_id'].unique())
>>> rec_wo_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 22 | 0.0 | 1 |
| 1 | 24 | 0.0 | 2 |
| 1 | 25 | 0.0 | 3 |
| 2 | 21 | 0.0 | 1 |
| 2 | 23 | 0.0 | 2 |
| 2 | 24 | 0.0 | 3 |
| 2 | 26 | 0.0 | 4 |
| 3 | 21 | 0.333333333333 | 1 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 25 | 0.166666666667 | 5 |
+---------+---------+----------------+------+
[12 rows x 4 columns]
>>> # we recommend items owned by user
>>> rec_w_own_item = rec_model.recommend(sf['user_id'].unique(), exclude_known=False)
>>> rec_w_own_item.sort('user_id').print_rows(100)
+---------+---------+----------------+------+
| user_id | item_id | score | rank |
+---------+---------+----------------+------+
| 1 | 21 | 0.666666666667 | 1 |
| 1 | 23 | 0.666666666667 | 2 |
| 1 | 26 | 0.666666666667 | 3 |
| 1 | 22 | 0.0 | 4 |
| 1 | 24 | 0.0 | 5 |
| 1 | 25 | 0.0 | 6 |
| 2 | 26 | 0.0 | 6 |
| 2 | 24 | 0.0 | 5 |
| 2 | 23 | 0.0 | 4 |
| 2 | 21 | 0.0 | 3 |
| 2 | 25 | 0.5 | 2 |
| 2 | 22 | 0.5 | 1 |
| 3 | 24 | 0.0 | 6 |
| 3 | 25 | 0.166666666667 | 5 |
| 3 | 22 | 0.166666666667 | 4 |
| 3 | 26 | 0.333333333333 | 3 |
| 3 | 23 | 0.333333333333 | 2 |
| 3 | 21 | 0.333333333333 | 1 |
+---------+---------+----------------+------+
[18 rows x 4 columns]
>>> # we add recommended items not owned by user to the original SFrame
>>> rec = rec_wo_own_item.groupby('user_id', {'reco':gl.aggregate.CONCAT('item_id')})
>>> sf = sf.join(rec, 'user_id', 'left')
>>> print sf
+---------+---------+----------------------+
| item_id | user_id | reco |
+---------+---------+----------------------+
| 21 | 1 | [24, 25, 22] |
| 22 | 2 | [24, 26, 23, 21] |
| 23 | 1 | [24, 25, 22] |
| 24 | 3 | [21, 23, 26, 25, 22] |
| 25 | 2 | [24, 26, 23, 21] |
| 26 | 1 | [24, 25, 22] |
+---------+---------+----------------------+
[6 rows x 3 columns]