连接和串联记录的 LIKE 搜索真的很慢 (PostgreSQL)

Question

我正在 return 从相关 [=] users table、where 特定列中获取 id 的唯一列表67=] (positions) 包含一个匹配的字符串。

相关的table每个用户记录可能有多个记录。

查询花费了非常非常长的时间（它不可扩展），所以我想知道我是否在某些基本方面错误地构建了查询？

用户Table:

id | name
-----------
1  | frank
2  | kim
3  | jane

职位Table:

id | user_id | title     | company | description
--------------------------------------------------
1  | 1       | manager   | apple   | 'Managed a team of...'
2  | 1       | assistant | apple   | 'Assisted the...'
3  | 2       | developer | huawei  | 'Build a feature that...'

例如： 如果相关 positions 记录包含 "apple"，我想 return 用户的 id title、company 或 description 列。

查询：

select
  distinct on (users.id) users.id,
  users.name,
  ...
from users
where (
    select
        string_agg(distinct users.description, ', ') ||
        string_agg(distinct users.title, ', ') ||
        string_agg(distinct users.company, ', ')
    from positions
    where positions.users_id::int = users.id
    group by positions.users_id::int) like '%apple%'

更新

我喜欢将其移动到 join 子句中的想法。但我要做的是根据以下条件过滤用户。而且我不确定如何在 join.

中做到这两点

1) 在标题、公司、描述中找到关键词

or

2) 在另一个 table.

文档的相关字符串版本中搜索 full-text 查找关键字

select
    to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) @@ to_tsquery('apple')

所以我最初认为它可能看起来像，

select
  distinct on (users.id) users.id,
  users.name,
  ...
from users
where (
    (select
        string_agg(distinct users.description, ', ') ||
        string_agg(distinct users.title, ', ') ||
        string_agg(distinct users.company, ', ')
    from positions
    where positions.users_id::int = users.id
    group by positions.users_id::int) like '%apple%')
    or
    (select
        to_tsvector(string_agg(distinct documents.content, ', '))
    from documents
    where users.id = documents.user_id
    group by documents.user_id) @@ to_tsquery('apple'))

但后来真的很慢 - 我可以确认慢是从第一个条件开始的，而不是 full-text 搜索。

Answer 1

可能不是最好的解决方案，但一个快速的选择是：

SELECT  DISTINCT ON ( u.id ) u.id,
        u.name
FROM    users u
JOIN    positions p ON (
                 p.user_id = u.id
            AND  ( description || title || company )
            LIKE '%apple%'
        );

基本上摆脱了子查询、不必要的string_agg用法、位置分组table等

distinct on.

介绍了它的作用是执行条件连接和删除重复项

PS！我使用 table 别名 u 和 p 来缩短示例

编辑： 根据要求添加 WHERE 示例

SELECT  DISTINCT ON ( u.id ) u.id,
        u.name
FROM    users u
JOIN    positions p ON ( p.user_id = u.id )
WHERE   ( p.description || p.title || p.company ) LIKE '%apple%'
OR      ...your other conditions...;

EDIT2: 新细节揭示了原始问题的新要求。所以为更新的问题添加新的例子：

由于您使用 OR 条件查找 2 个不同的 tables（位置和上传），因此简单的 JOIN 将不起作用。但是这两个查找都是验证型查找——只查找 %apple% 是否存在，则不需要对数据进行聚合分组和转换。无论如何，使用 EXISTS that returns TRUE 作为第一个匹配项似乎是你所需要的。因此，如果找到第一个匹配项，则删除所有不必要的部分并使用 with LIMIT 1 到 return 正值，如果没有找到则使用 NULL （后者会使 EXISTS 变为 FALSE）会给你同样的结果结果。

下面是解决方法：

SELECT  DISTINCT ON ( u.id ) u.id,
        u.name
FROM    users u
WHERE   EXISTS (
            SELECT  1
            FROM    positions p
            WHERE   p.users_id = u.id::int
            AND     ( description || title || company ) LIKE '%apple%'
            LIMIT   1
        )
OR      EXISTS (
            SELECT  1
            FROM    uploads up
            WHERE   up.user_id = u.id::int -- you had here reference to table 'document', but it doesn't exists in your example query, so I just added relation to 'upoads' table as you have in FROM, assuming 'content' column exists there
            AND     up.content LIKE '%apple%'
            LIMIT   1
        );

注意！在您的示例查询中有对 tables/aliases 的引用，例如 documents，它不会反映在 FROM 部分的任何地方。因此，要么你用错误的命名切入了你的示例真实查询，要么你以其他方式输入了错误，你需要相应地验证和调整我的示例查询。

连接和串联记录的 LIKE 搜索真的很慢 (PostgreSQL)

LIKE search of joined and concatenated records is really slow (PostgreSQL)

postgresql

group-by

string-aggregation

sql-like

更新