SQL 计算推荐系统的喜欢-不喜欢,基于用户的协同过滤

SQL counting likes-dislikes for recommendation system, collaborative filtering User-Based

这个想法是用户对不同的项目留下了喜欢和不喜欢,我需要得到一个与 ratings(喜欢和不喜欢)相同的用户列表选定用户 (USER_ID = 1),以确定他们的相似度。

RATING Column:
1 = like,
0 = dislike

完整 table:

+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING |                      -EXAMPLE-                   |
+---------+---------+--------+--------------------------------------------------+
|       1 |       1 |      1 |-+
|       1 |       2 |      1 | |
|       1 |       3 |      1 | +-[1,1,1,0,0] user_1 vector of ratings
|       1 |       4 |      0 | |  |     | | 
|       1 |       5 |      0 |-+  |     | |     
|       3 |       1 |      1 |----+     + + total_match with user_1 = 3 [1,0,0]
|       3 |       2 |      0 |          | |        
|       3 |       3 |      0 |          | |       
|       3 |       4 |      0 |----------+ |
|       3 |       5 |      0 |------------+
|       4 |       1 |      1 |
|       4 |       2 |      1 |
|       4 |       3 |      1 |
|       4 |       4 |      0 |
|       4 |       5 |      0 |
+---------+---------+--------+

匹配计算:

user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3

如何进行 SQL 查询以获得以下结果:

+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
|       3 |           1 |              2 |           3 |
|       4 |           3 |              2 |           5 |
+---------+-------------+----------------+-------------+

有什么想法吗?

您可能需要多个子查询才能得到想要的结果,请看下面的代码:

select  res1.user_id,
        sum(res1.likes_match1) as likes_match,
        sum(res1.dislikes_match1) as dislikes_match,
        sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
  from(
select res.user_id, 
case 
     when res.rating=1 then count(res.rating)
     else 0
 end as likes_match1,
case 
     when res.rating=0 then count(res.rating) 
     else 0
 end as dislikes_match1
 from
(
select b.user_id as user_id, 
case
       when a.rating=1 and b.rating=1 then 1
       else 0
  end as rating
from have a 
inner join have b
   on a.item_id=b.item_id 
  and a.user_id=1 
  and b.user_id <>1
  and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;

(这使用了 sqlite,但在其他数据库上工作应该不需要太多):

给出以下 table:

CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                   , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);

这个查询:

SELECT r1.user_id AS user_id
     , sum(r1.rating) AS likes_match
     , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
     , count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
                  AND r1.item_id = r2.item_id
                  AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;

产品:

user_id     likes_match  dislikes_match  total_match
----------  -----------  --------------  -----------
3           1            2               3          
4           3            2               5