按列分组并获取另一列中具有最大字符串长度的行
Group by columns and fetch rows with maximum length of string in another column
我在 Postgres 11 中关注 table。
table1
:
id col1 col2 col3 col4
NCT00000374 Drug olanzapine olanzapine olanzapine
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine hydrochloride
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine
NCT00000412 Drug placebo calcitriol placebo calcitriol calcitriol
我想获取每个 (id, col1, col2, col3)
.
具有最大长度值的行
期望的输出是:
id col1 col2 col3 col4
NCT00000374 Drug olanzapine olanzapine olanzapine
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine hydrochloride
NCT00000412 Drug placebo calcitriol placebo calcitriol calcitriol
到目前为止,我尝试了以下查询但没有成功:
select * from table1
where length(col4) = max(length(col4))
group by id, col1, col2, col3
order by id
DISTINCT ON
的案例:
SELECT DISTINCT ON (id, col1, col2, col3)
*
FROM table1
ORDER BY id, col1, col2, col3, length(col4) DESC NULLS LAST;
最简单,每行几行 (id, col1, col2, col3)
通常也是最快的。详细解释:
- Select first row in each GROUP BY group?
对于大表和每组很多行,有(多)更快的技术:
- Optimize GROUP BY query to retrieve latest row per user
我在 Postgres 11 中关注 table。
table1
:
id col1 col2 col3 col4
NCT00000374 Drug olanzapine olanzapine olanzapine
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine hydrochloride
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine
NCT00000412 Drug placebo calcitriol placebo calcitriol calcitriol
我想获取每个 (id, col1, col2, col3)
.
期望的输出是:
id col1 col2 col3 col4
NCT00000374 Drug olanzapine olanzapine olanzapine
NCT00000390 Drug imipramine hydrochloride imipramine hydrochloride imipramine hydrochloride
NCT00000412 Drug placebo calcitriol placebo calcitriol calcitriol
到目前为止,我尝试了以下查询但没有成功:
select * from table1
where length(col4) = max(length(col4))
group by id, col1, col2, col3
order by id
DISTINCT ON
的案例:
SELECT DISTINCT ON (id, col1, col2, col3)
*
FROM table1
ORDER BY id, col1, col2, col3, length(col4) DESC NULLS LAST;
最简单,每行几行 (id, col1, col2, col3)
通常也是最快的。详细解释:
- Select first row in each GROUP BY group?
对于大表和每组很多行,有(多)更快的技术:
- Optimize GROUP BY query to retrieve latest row per user