在 Postgres 的另一列中获取具有最大字符串长度的行

Fetch rows with maximum length of string in another column in Postgres

我在 postgres11

中关注 table
trial_id    name_split  drug_name_who
NCT01877395 imovax® rabies  imovax
NCT01877395 imovax® rabies  imovax rabies
NCT01877395 imovax® rabies  rabies
NCT01877395 imovax® rabies  rabies imovax
NCT00000374     olanzapine      olanzapine                  
NCT00000390     imipramine hydrochloride    imipramine hydrochloride    
NCT00000390     imipramine hydrochloride    imipramine                  

我想获取每个 'trial_id name_split' 的最大长度值的行。

我尝试了以下查询:

with x as (

        SELECT distinct on (trial_id,name_split) *
        FROM table
        WHERE 
            regexp_replace(name_split, '[^\w]', '#', 'g') ~* ('\y'||regexp_replace(drug_name_who, '[^\w]', '#', 'g')||'\y')
            and (length(drug_name_who) > 2)
            or (drug_name_who is null)
            ORDER  BY trial_id, name_split, length(drug_name_who) DESC NULLS LAST)
            
select * from x; 

查询可以正确获取 'drug_name_who' per trial_id 的长度不相等的行,但是当 'drug_name_who' per trial_id 的长度相等时,查询仅选择一行(例如:NCT01877395,缺少下一行:NCT01877395 imovax® rabies imovax)

期望的输出是:

trial_id    name_split  drug_name_who
NCT01877395 imovax® rabies  imovax
NCT01877395 imovax® rabies  rabies
NCT00000374     olanzapine      olanzapine                  
NCT00000390     imipramine hydrochloride    imipramine hydrochloride    

非常感谢这里的任何帮助

distinct on 总是 return 每组只有一行 - 如果 order by 子句不是确定性的,那么你会从关系中随机得到一行。

如果你想允许联系,那么你可以使用 rank() 和一个子查询来代替:

select *
from (
    select 
        t.*, 
        rank() over(
            partition by trial_id, name_split 
            order by length(drug_name_who) desc
        ) rn
    from mytable t
    where ...
) t
where rn = 1