SQLAlchemy 中的 PostgreSQL ts_stat

PostgreSQL ts_stat in SQLAlchemy

Postgres 对 ts_stat 查询使用了一种奇怪的语法,其中包含一个包含您想要统计的语句的文字字符串,例如:

SELECT * FROM ts_stat('SELECT content_ts FROM document_contents')
ORDER BY nentry DESC, ndoc DESC, word;

我想在 SQLAlchemy 中使用 Query 对象来处理带有许多可选过滤器的复杂查询,例如:

SELECT content_ts 
FROM document_contents
JOIN fact_api ON document_contents.id = fact_api.content_id 
WHERE fact_api.day >= %(day_1)s
AND fact_api.day <= %(day_2)s
AND fact_api.unit IN (%(unit_1)s)
AND fact_api.term IN (%(term_1)s, %(term_2)s)

我有生成该内部查询的 SQLAlchemy 代码。有生成 ts_stat 查询的好方法吗?

这似乎有效:

query = session.query( ... lots of joins ... )
literal_query = str(query.statement.compile(engine, compile_kwargs={"literal_binds": True}))
ts_stat = text('SELECT * FROM ts_stat($$' + 
               literal_query + 
               '$$) ORDER BY nentry DESC, ndoc DESC, word')
for row in session.execute(ts_stat):
    print(row)

请参阅此以获取查询: http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html

这是 $$ 的: https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

您可以在 custom FunctionElement:

中隐藏实际编译
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionElement, column
from sqlalchemy.sql.base import ColumnCollection
from sqlalchemy.types import TEXT, INTEGER


class ts_stat(FunctionElement):
    name = "ts_stat"

    @property
    def columns(self):
        # Using (undocumented) `_selectable=self` would allow
        # omitting the explicit `select_from(ts_stat_obj)` in
        # every query using `ts_stat`.
        return ColumnCollection(
            column("word", TEXT),
            column("ndoc", INTEGER),
            column("nentry", INTEGER))

@compiles(ts_stat, 'postgresql')
def pg_ts_stat(element, compiler, **kw):
    kw.pop("asfrom", None)  # Ignore and set explicitly
    arg1, = element.clauses
    # arg1 is a FromGrouping, which would force parens around the SELECT.
    stmt = compiler.process(
        arg1.element, asfrom=False, literal_binds=True, **kw)
    # TODO: Choose a random tag for dollar quoting. Another option
    # would be to wrap the stmt in `literal()`, compiling that, and
    # letting the driver worry about quoting.
    return f"ts_stat($${stmt}$$)"

用法很简单:你传递 SelectQuery 作为唯一参数:

from sqlalchemy import select, column, literal
from sqlalchemy.dialects import postgresql
from sqlalchemy.orm import sessionmaker

d = postgresql.dialect()

s = select([1])
f = ts_stat(s)
stmt = select([f.c.word, f.c.ndoc, f.c.nentry]).\
    select_from(f).\
    order_by(f.c.nentry.desc(),
             f.c.ndoc.desc(),
             f.c.word).\
    compile(dialect=d)
print(stmt)
# SELECT word, ndoc, nentry 
# FROM ts_stat($$SELECT 1$$) ORDER BY nentry DESC, ndoc DESC, word

Session = sessionmaker()
session = Session()

q = session.query(literal(1))
f2 = ts_stat(q)
stmt2 = select(['*']).\
    select_from(f2).\
    order_by(f2.c.nentry.desc(),
             f2.c.ndoc.desc(),
             f2.c.word).\
    compile(dialect=d)
print(stmt2)
# SELECT * 
# FROM ts_stat($$SELECT 1 AS param_1$$) ORDER BY nentry DESC, ndoc DESC, word

请注意,使用 literal_binds=True 会限制您可以作为参数传递给内部 select 的内容,如 "How do I render SQL expressions as strings, possibly with bound parameters inlined?" 中所述。

当然,这样的结构使得其他读者 non-obvious DB 函数 ts_stat() 在现实中接受字符串参数,但在这种情况下,也许它的便利性胜过。