使用 postgresql 连接查询计算百分比
calculate percentages with postgresql join queries
我试图通过连接 3 个表数据来计算百分比,以获得每个用户推文的 positive_count、negative_count、neutral_count 的百分比。我已经成功地获得了正面、负面和中性的计数,但未能获得相同的百分比而不是计数。这是获取计数的查询:
SELECT
t1.u_id,count() as total_tweets_count ,
(
SELECT count() from t1,t2,t3 c
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Positive'
) as pos_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Negative'
) as neg_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Neutral'
) as neu_count
FROM t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id
分组依据t1.u_id;
**OUTPUT:**
u_id | total_tweets_count | pos_count | neg_count | neu_count
-----------------+--------------------+-----------+-----------+-------
18839785| 88 | 38 | 25 | 25
(1 row)
现在我想要相同的百分比而不是计数。我用下面的方式写了查询但是失败了。
SELECT
total_tweets_count,pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_count, round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
FROM (
SELECT
count(*) as total_tweets_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Positive'
) AS pos_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Negative'
) AS neg_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Neutral') AS neu_count
FROM t1,t2, t3
WHERE
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id
GROUP BY a.u_id
) sub;
谁能帮我实现如下每个用户数据的百分比?
u_id | total_tweets_count | pos_count | neg_count | neu_count
------------------+--------------------+-----------+-----------+-----
18839785| 88 | 43.18 | 28.4 | 28.4
(1 row)
我不太确定你在找什么。
对于初学者,您可以使用条件聚合而不是三个标量子查询来简化查询(顺便说一下,不需要在 a.u_id
上重复 where 条件)
您声明要 "count for all users",因此您需要删除主查询中的 WHERE 子句。简化也摆脱了重复的 WHERE 条件。
select u_id,
total_tweets_count,
pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,
neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_cont,
round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
from (
SELECT
t1.u_id,
count(*) as total_tweets_count,
count(case when t3.sentiment='Positive' then 1 end) as pos_count,
count(case when t3.sentiment='Negative' then 1 end) as neg_count,
count(case when t3.sentiment='Neutral' then 1 end) as neu_count
FROM t1
JOIN t2 ON t1.u_id=t2.u_id
JOIN t3 t2.ts_id=t3.ts_id
-- no WHERE condition on the u_id here
GROUP BY t1.u_id
) t
请注意,我将 WHERE 子句中过时、古老且脆弱的隐式连接替换为 "modern" 显式 JOIN 运算符
使用更新的 Postgres 版本,表达式 count(case when t3.sentiment='Positive' then 1 end) as pos_count
也可以重写为:
count(*) filter (where t3.sentiment='Positive') as pos_count
这更具可读性(我认为也更容易理解)。
在您的查询中,您可以通过使用共同相关的子查询来实现对 u_id 的全局 WHERE 条件的重复,例如:
(
SELECT count(*)
FROM t1 inner_t1 --<< use different aliases than in the outer query
JOIN t2 inner_t2 ON inner_t2.u_id = inner_t1.u_id
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
-- referencing the outer t1 removes the need to repeat the hardcoded ID
WHERE innter_t1.u_id = t1.u_id
) as pos_count
tablet1
的重复也没有必要,所以上面可以改写为:
(
SELECT count(*)
FROM t2 inner_t2
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
WHERE inner_t2.u_id = t1.u_id --<< this references the outer t1 table
) as pos_count
但是使用条件聚合的版本仍然比使用三个标量子查询快 很多(即使你删除了 t1
table).
我试图通过连接 3 个表数据来计算百分比,以获得每个用户推文的 positive_count、negative_count、neutral_count 的百分比。我已经成功地获得了正面、负面和中性的计数,但未能获得相同的百分比而不是计数。这是获取计数的查询:
SELECT
t1.u_id,count() as total_tweets_count ,
(
SELECT count() from t1,t2,t3 c
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Positive'
) as pos_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Negative'
) as neg_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Neutral'
) as neu_count
FROM t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id
分组依据t1.u_id;
**OUTPUT:**
u_id | total_tweets_count | pos_count | neg_count | neu_count
-----------------+--------------------+-----------+-----------+-------
18839785| 88 | 38 | 25 | 25
(1 row)
现在我想要相同的百分比而不是计数。我用下面的方式写了查询但是失败了。
SELECT
total_tweets_count,pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_count, round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
FROM (
SELECT
count(*) as total_tweets_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Positive'
) AS pos_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Negative'
) AS neg_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Neutral') AS neu_count
FROM t1,t2, t3
WHERE
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id
GROUP BY a.u_id
) sub;
谁能帮我实现如下每个用户数据的百分比?
u_id | total_tweets_count | pos_count | neg_count | neu_count
------------------+--------------------+-----------+-----------+-----
18839785| 88 | 43.18 | 28.4 | 28.4
(1 row)
我不太确定你在找什么。
对于初学者,您可以使用条件聚合而不是三个标量子查询来简化查询(顺便说一下,不需要在 a.u_id
上重复 where 条件)
您声明要 "count for all users",因此您需要删除主查询中的 WHERE 子句。简化也摆脱了重复的 WHERE 条件。
select u_id,
total_tweets_count,
pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,
neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_cont,
round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
from (
SELECT
t1.u_id,
count(*) as total_tweets_count,
count(case when t3.sentiment='Positive' then 1 end) as pos_count,
count(case when t3.sentiment='Negative' then 1 end) as neg_count,
count(case when t3.sentiment='Neutral' then 1 end) as neu_count
FROM t1
JOIN t2 ON t1.u_id=t2.u_id
JOIN t3 t2.ts_id=t3.ts_id
-- no WHERE condition on the u_id here
GROUP BY t1.u_id
) t
请注意,我将 WHERE 子句中过时、古老且脆弱的隐式连接替换为 "modern" 显式 JOIN 运算符
使用更新的 Postgres 版本,表达式 count(case when t3.sentiment='Positive' then 1 end) as pos_count
也可以重写为:
count(*) filter (where t3.sentiment='Positive') as pos_count
这更具可读性(我认为也更容易理解)。
在您的查询中,您可以通过使用共同相关的子查询来实现对 u_id 的全局 WHERE 条件的重复,例如:
(
SELECT count(*)
FROM t1 inner_t1 --<< use different aliases than in the outer query
JOIN t2 inner_t2 ON inner_t2.u_id = inner_t1.u_id
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
-- referencing the outer t1 removes the need to repeat the hardcoded ID
WHERE innter_t1.u_id = t1.u_id
) as pos_count
tablet1
的重复也没有必要,所以上面可以改写为:
(
SELECT count(*)
FROM t2 inner_t2
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
WHERE inner_t2.u_id = t1.u_id --<< this references the outer t1 table
) as pos_count
但是使用条件聚合的版本仍然比使用三个标量子查询快 很多(即使你删除了 t1
table).