SQL 计算推荐系统的喜欢-不喜欢,基于用户的协同过滤
SQL counting likes-dislikes for recommendation system, collaborative filtering User-Based
这个想法是用户对不同的项目留下了喜欢和不喜欢,我需要得到一个与 ratings(喜欢和不喜欢)相同的用户列表选定用户 (USER_ID = 1),以确定他们的相似度。
RATING Column:
1 = like,
0 = dislike
完整 table:
+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING | -EXAMPLE- |
+---------+---------+--------+--------------------------------------------------+
| 1 | 1 | 1 |-+
| 1 | 2 | 1 | |
| 1 | 3 | 1 | +-[1,1,1,0,0] user_1 vector of ratings
| 1 | 4 | 0 | | | | |
| 1 | 5 | 0 |-+ | | |
| 3 | 1 | 1 |----+ + + total_match with user_1 = 3 [1,0,0]
| 3 | 2 | 0 | | |
| 3 | 3 | 0 | | |
| 3 | 4 | 0 |----------+ |
| 3 | 5 | 0 |------------+
| 4 | 1 | 1 |
| 4 | 2 | 1 |
| 4 | 3 | 1 |
| 4 | 4 | 0 |
| 4 | 5 | 0 |
+---------+---------+--------+
匹配计算:
user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3
如何进行 SQL 查询以获得以下结果:
+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
| 3 | 1 | 2 | 3 |
| 4 | 3 | 2 | 5 |
+---------+-------------+----------------+-------------+
有什么想法吗?
您可能需要多个子查询才能得到想要的结果,请看下面的代码:
select res1.user_id,
sum(res1.likes_match1) as likes_match,
sum(res1.dislikes_match1) as dislikes_match,
sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
from(
select res.user_id,
case
when res.rating=1 then count(res.rating)
else 0
end as likes_match1,
case
when res.rating=0 then count(res.rating)
else 0
end as dislikes_match1
from
(
select b.user_id as user_id,
case
when a.rating=1 and b.rating=1 then 1
else 0
end as rating
from have a
inner join have b
on a.item_id=b.item_id
and a.user_id=1
and b.user_id <>1
and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;
(这使用了 sqlite,但在其他数据库上工作应该不需要太多):
给出以下 table:
CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
, PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);
这个查询:
SELECT r1.user_id AS user_id
, sum(r1.rating) AS likes_match
, sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
, count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
AND r1.item_id = r2.item_id
AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;
产品:
user_id likes_match dislikes_match total_match
---------- ----------- -------------- -----------
3 1 2 3
4 3 2 5
这个想法是用户对不同的项目留下了喜欢和不喜欢,我需要得到一个与 ratings(喜欢和不喜欢)相同的用户列表选定用户 (USER_ID = 1),以确定他们的相似度。
RATING Column:
1 = like,
0 = dislike
完整 table:
+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING | -EXAMPLE- |
+---------+---------+--------+--------------------------------------------------+
| 1 | 1 | 1 |-+
| 1 | 2 | 1 | |
| 1 | 3 | 1 | +-[1,1,1,0,0] user_1 vector of ratings
| 1 | 4 | 0 | | | | |
| 1 | 5 | 0 |-+ | | |
| 3 | 1 | 1 |----+ + + total_match with user_1 = 3 [1,0,0]
| 3 | 2 | 0 | | |
| 3 | 3 | 0 | | |
| 3 | 4 | 0 |----------+ |
| 3 | 5 | 0 |------------+
| 4 | 1 | 1 |
| 4 | 2 | 1 |
| 4 | 3 | 1 |
| 4 | 4 | 0 |
| 4 | 5 | 0 |
+---------+---------+--------+
匹配计算:
user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3
如何进行 SQL 查询以获得以下结果:
+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
| 3 | 1 | 2 | 3 |
| 4 | 3 | 2 | 5 |
+---------+-------------+----------------+-------------+
有什么想法吗?
您可能需要多个子查询才能得到想要的结果,请看下面的代码:
select res1.user_id,
sum(res1.likes_match1) as likes_match,
sum(res1.dislikes_match1) as dislikes_match,
sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
from(
select res.user_id,
case
when res.rating=1 then count(res.rating)
else 0
end as likes_match1,
case
when res.rating=0 then count(res.rating)
else 0
end as dislikes_match1
from
(
select b.user_id as user_id,
case
when a.rating=1 and b.rating=1 then 1
else 0
end as rating
from have a
inner join have b
on a.item_id=b.item_id
and a.user_id=1
and b.user_id <>1
and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;
(这使用了 sqlite,但在其他数据库上工作应该不需要太多):
给出以下 table:
CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
, PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);
这个查询:
SELECT r1.user_id AS user_id
, sum(r1.rating) AS likes_match
, sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
, count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
AND r1.item_id = r2.item_id
AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;
产品:
user_id likes_match dislikes_match total_match
---------- ----------- -------------- -----------
3 1 2 3
4 3 2 5