同一 table 上的多个连接乘以计数
Multiple joins on the same table multiply counts
当我 运行 在同一个 table 上进行多个连接时,第一个连接似乎是唯一通过的连接。
例如,我会得到这样的结果:
ID, NAME, 200, 200
ID, NAME, 150, 150
ID, NAME, 100, 100
显然票数与时间条目数明显不同。
select
contact.aid aid,
(contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username,
count(ticket) tickets,
count(time) entries
from caches contact
inner join caches ticket
on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid
inner join caches time
on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid
where
contact.name='Contact'
group by
contact.aid,
username
order by
tickets desc
;
我应该得到如下结果:
ID, NAME, 200, 421
ID, NAME, 150, 312
ID, NAME, 100, 152
我猜你是沿着两个不同的维度加入的,因此得到了错误的结果。
如果是这样,您可以使用count(distinct)
。这是一个猜测,但也许:
count(distinct ticket) as tickets,
count(distinct time) as entries
你能看看这对你有用吗?这是我在单个 table.
中包含不同相关记录类型时使用的方法
如果第一个 case
列有错误,则将 numeric
转换为 aid
的任何类型(可能 int
或 bigint
).
select case name
when 'Contact' then aid
when 'Ticket' then (data->>'CreatorResourceID')::numeric
when 'TimeEntry' then (data->>'TicketID')::numeric
end as aid,
max (
case
when name = 'Contact'
then concat(
data->>'FirstName',
' ',
data->>'LastName'
)
else null
end
) as username,
count(*) filter (where name = 'Ticket') as tickets,
count(*) filter (where name = 'TimeEntry') as entries
from contact
group by aid
order by tickets desc;
主要问题同这里:
- Two SQL LEFT JOINS produce incorrect result
通过在 jsonb
列中嵌套值,您的情况会更加模糊,但都是一样的。
先聚合,后加入:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, sum(ticket.tickets) AS tickets
, sum(ticket.entries) AS entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
GROUP BY contact.aid, username
ORDER BY ticket.tickets DESC;
假设 aid
,或者至少 (aid, username)
在基础 table 中是唯一的,我们根本不需要外部聚合:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, ticket.tickets
, ticket.entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
ORDER BY ticket.tickets DESC;
它不仅避免了相乘计数的主要错误,而且通常还可以加快查询速度。
相关:
- Multiple array_agg() calls in a single query
您的原始查询中有 INNER JOIN
,可能应该是 LEFT JOIN ... ON true
,以避免排除没有有效条目的用户。在我的解决方案中将它转换为 CROSS JOIN
是安全的,因为每个子查询级别都保证 return 恰好一行(聚合函数,而不是 GROUP BY
)。参见:
关于 LATERAL
加入:
在子查询中转换为整数 (::int
) 是可选的(并假设计数永远不会超出整数范围)。它避免了升级到 numeric
,总结起来更昂贵。
为什么concat_ws()
?参见:
- How to concatenate columns in a Postgres SELECT?
data ->> 'TicketID'
和 data ->> 'CreatorResourceID'
必须是 numeric
吗?看起来他们应该是 integer
.
旁白:规范化您的数据模型(至少在某种程度上)可能对您的事业有所帮助。对嵌套在 jsonb
列中的数据值加入 tables 相对昂贵,通常可以提高效率。
当我 运行 在同一个 table 上进行多个连接时,第一个连接似乎是唯一通过的连接。
例如,我会得到这样的结果:
ID, NAME, 200, 200
ID, NAME, 150, 150
ID, NAME, 100, 100
显然票数与时间条目数明显不同。
select
contact.aid aid,
(contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username,
count(ticket) tickets,
count(time) entries
from caches contact
inner join caches ticket
on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid
inner join caches time
on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid
where
contact.name='Contact'
group by
contact.aid,
username
order by
tickets desc
;
我应该得到如下结果:
ID, NAME, 200, 421
ID, NAME, 150, 312
ID, NAME, 100, 152
我猜你是沿着两个不同的维度加入的,因此得到了错误的结果。
如果是这样,您可以使用count(distinct)
。这是一个猜测,但也许:
count(distinct ticket) as tickets,
count(distinct time) as entries
你能看看这对你有用吗?这是我在单个 table.
中包含不同相关记录类型时使用的方法如果第一个 case
列有错误,则将 numeric
转换为 aid
的任何类型(可能 int
或 bigint
).
select case name
when 'Contact' then aid
when 'Ticket' then (data->>'CreatorResourceID')::numeric
when 'TimeEntry' then (data->>'TicketID')::numeric
end as aid,
max (
case
when name = 'Contact'
then concat(
data->>'FirstName',
' ',
data->>'LastName'
)
else null
end
) as username,
count(*) filter (where name = 'Ticket') as tickets,
count(*) filter (where name = 'TimeEntry') as entries
from contact
group by aid
order by tickets desc;
主要问题同这里:
- Two SQL LEFT JOINS produce incorrect result
通过在 jsonb
列中嵌套值,您的情况会更加模糊,但都是一样的。
先聚合,后加入:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, sum(ticket.tickets) AS tickets
, sum(ticket.entries) AS entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
GROUP BY contact.aid, username
ORDER BY ticket.tickets DESC;
假设 aid
,或者至少 (aid, username)
在基础 table 中是唯一的,我们根本不需要外部聚合:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, ticket.tickets
, ticket.entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
ORDER BY ticket.tickets DESC;
它不仅避免了相乘计数的主要错误,而且通常还可以加快查询速度。
相关:
- Multiple array_agg() calls in a single query
您的原始查询中有 INNER JOIN
,可能应该是 LEFT JOIN ... ON true
,以避免排除没有有效条目的用户。在我的解决方案中将它转换为 CROSS JOIN
是安全的,因为每个子查询级别都保证 return 恰好一行(聚合函数,而不是 GROUP BY
)。参见:
关于 LATERAL
加入:
在子查询中转换为整数 (::int
) 是可选的(并假设计数永远不会超出整数范围)。它避免了升级到 numeric
,总结起来更昂贵。
为什么concat_ws()
?参见:
- How to concatenate columns in a Postgres SELECT?
data ->> 'TicketID'
和 data ->> 'CreatorResourceID'
必须是 numeric
吗?看起来他们应该是 integer
.
旁白:规范化您的数据模型(至少在某种程度上)可能对您的事业有所帮助。对嵌套在 jsonb
列中的数据值加入 tables 相对昂贵,通常可以提高效率。