Select 未嵌套的词位 ts_vectors

Question

我尝试 select 仅来自未嵌套 ts_vector 列的词素：

select lexeme
from 
    (select unnest(to_tsvector('russian', description))
     from cards) as roots;

但这不起作用，因为 SQL 对 lexeme 列一无所知。我如何才能 select 仅来自未嵌套的 ts_vectors 的词位？

Answer 1

我找到了一种简洁的方法：

SELECT (unnest(to_tsvector(description))).lexeme
FROM cards

Answer 2

你自己发现了什么：

SELECT (unnest(to_tsvector(description))).lexeme
FROM   cards;

FROM 列表中带有集合返回函数的等效标准 SQL 形式稍微冗长一些，但更容易集成到更大的查询中：

SELECT d.lexeme
FROM   cards c
LEFT   JOIN LATERAL unnest(to_tsvector(c.description))) d;

为什么？怎么样？

自 Postgres 9.6 以来，有 unnest() 的第二个“重载”变体。引用 the release notes:

Add new functions for tsvector data (Stas Kelvich)

The new functions are ts_delete(), ts_filter(), unnest(), tsvector_to_array(), array_to_tsvector(), and a variant of setweight() that sets the weight only for specified lexeme(s).

大胆强调我的。

参见：

SELECT proname, proargtypes::regtype[], prorettype::regtype
FROM   pg_proc
where  proname = 'unnest';

proname | proargtypes      | prorettype
--------+------------------+-----------
unnest  | [0:0]={anyarray} | anyelement
unnest  | [0:0]={tsvector} | record    
(2 rows)

db<>fiddle here

该函数记录在 the manual among text search functions:

unnest(tsvector, OUT <i><b>lexeme</b></i> text, OUT <i><b>positions</b></i> smallint[], OUT <i><b>weights</b></i> text)

它 returns setof record 具有命名的输出列。因此，我们可以像以前一样直接引用 lexeme 列。

Select 未嵌套的词位 ts_vectors

Select lexemes from unnested ts_vectors

sql

postgresql

full-text-search

tsvector

unnest

为什么？怎么样？