SQL 与没有函数包装器的查询相比，函数非常慢

Question

我有这个运行速度非常快（~12 毫秒）的 PostgreSQL 9.4 查询：

SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email, 
  customers.name,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = 2
ORDER BY
  auth_web_events.id DESC;

但是如果我将它嵌入到一个函数中，查询在所有数据中运行速度非常慢，似乎是运行通过每条记录，我错过了什么？，我有 ~1M 的数据，我想要简化我的数据库层，将大型查询存储到函数和视图中。

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
    id int,
    time_stamp timestamp with time zone,
    description text,
    origin text,
    userlogin text,
    customer text,
    client_ip inet
     ) AS
$func$
SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email AS user, 
  customers.name AS customer,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = 
ORDER BY
  auth_web_events.id DESC;
  $func$ LANGUAGE SQL;

查询计划是：

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
"  Sort Key: auth_web_events.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events  (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"              ->  Index Scan using auth_user_pkey on auth_user  (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
"                    Index Cond: (id = 2)"
"        ->  Index Scan using customers_id_idx on customers  (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
"              Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"

我是这样调用函数的：

SELECT * from get_web_events_by_userid(2)

函数的查询计划：

"Function Scan on get_web_events_by_userid  (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"

编辑：我只是更改了参数，但问题仍然存在。
EDIT2：欧文答案的查询计划：

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
"  Sort Key: w.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
"              ->  Index Scan using auth_user_pkey on auth_user u  (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
"                    Index Cond: (id = 2)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events w  (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"        ->  Index Scan using customers_id_idx on customers c  (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
"              Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"

Answer 1

通过使此查询动态化并使用 plpgsql，您将获得更好的性能。

CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE(
    id int,
    time_stamp timestamp with time zone,
    description text,
    origin text,
    userlogin text,
    customer text,
    client_ip inet
     ) AS $$
BEGIN

RETURN QUERY EXECUTE
'SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email AS user, 
  customers.name AS customer,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = ' || uid ||
'ORDER BY
  auth_web_events.id DESC;'

END;
$$ LANGUAGE plpgsql;

Answer 2

~~user~~

在重写您的函数时，我发现您在此处添加了列别名：

SELECT 
  ...
  auth_user.email <b>AS user</b>, 
  customers.name AS customer,

.. 一开始不会做任何事情，因为这些别名在函数外部不可见，并且在函数内部没有被引用。所以他们会被忽略。出于文档目的，最好使用评论。

但它也会使您的查询无效，因为 user 完全是 reserved word 并且不能用作列别名，除非用双引号引起来。

奇怪的是，在我的测试中，该函数似乎可以使用无效的别名。可能是因为被忽略了（？）。但我不确定这不会有副作用。

你的函数重写了（否则等价）：

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
  RETURNS TABLE (
     id int
   , time_stamp timestamptz
   , description text
   , origin text
   , userlogin text
   , customer text
   , client_ip inet
  )
  LANGUAGE sql STABLE AS
$func$
SELECT w.id
     , w.time_stamp
     , w.description 
     , w.origin  
     , u.email     -- AS user   -- make this a comment!
     , c.name      -- AS customer
     , w.client_ip
FROM   public.auth_user       u
JOIN   public.auth_web_events w ON w.user_id_fk = u.id
JOIN   public.customers       c ON c.id = u.customer_id_fk 
WHERE  u.id =    -- reverted the logic here
ORDER  BY w.id DESC
$func$;

显然，STABLE 关键字改变了结果。 Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. 此外，标准 EXPLAIN 不显示内部函数的查询计划。您可以为此使用附加模块 auto-explain：

Postgres query plan of a UDF invocation written in pgpsql

您的数据分布非常奇怪：

auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record

由于您没有另外定义，该函数假设 1000 行 的估计值被 return 编辑。但是您的函数实际上 returning 只有 2 行 。如果您的所有调用仅 return（在附近）2 行，只需声明添加 ROWS 2。也可能更改 VOLATILE 变体的查询计划（即使 STABLE 在这里仍然是正确的选择）。

SQL 与没有函数包装器的查询相比，函数非常慢

SQL function very slow compared to query without function wrapper

postgresql

function

sql-execution-plan

postgresql-performance