SQL 与没有函数包装器的查询相比,函数非常慢
SQL function very slow compared to query without function wrapper
我有这个运行速度非常快(~12 毫秒)的 PostgreSQL 9.4 查询:
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
但是如果我将它嵌入到一个函数中,查询在所有数据中运行速度非常慢,似乎是 运行 通过每条记录,我错过了什么?,我有 ~1M 的数据,我想要简化我的数据库层,将大型查询存储到函数和视图中。
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk =
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
查询计划是:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
我是这样调用函数的:
SELECT * from get_web_events_by_userid(2)
函数的查询计划:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
编辑:我只是更改了参数,但问题仍然存在。
EDIT2:欧文答案的查询计划:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
通过使此查询动态化并使用 plpgsql,您将获得更好的性能。
CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS $$
BEGIN
RETURN QUERY EXECUTE
'SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = ' || uid ||
'ORDER BY
auth_web_events.id DESC;'
END;
$$ LANGUAGE plpgsql;
user
在重写您的函数时,我发现您在此处添加了列别名:
SELECT
...
auth_user.email <b>AS user</b>,
customers.name AS customer,
.. 一开始不会做任何事情,因为这些别名在函数外部不可见,并且在函数内部没有被引用。所以他们会被忽略。出于文档目的,最好使用评论。
但它也会使您的查询 无效 ,因为 user
完全是 reserved word 并且不能用作列别名,除非用双引号引起来。
奇怪的是,在我的测试中,该函数似乎可以使用无效的别名。可能是因为被忽略了(?)。但我不确定这不会有副作用。
你的函数重写了(否则等价):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE (
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
)
LANGUAGE sql STABLE AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = -- reverted the logic here
ORDER BY w.id DESC
$func$;
显然,STABLE
关键字改变了结果。 Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. 此外,标准 EXPLAIN
不显示 内部 函数的查询计划。您可以为此使用附加模块 auto-explain:
您的数据分布非常奇怪:
auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record
由于您没有另外定义,该函数假设 1000 行 的估计值被 return 编辑。但是您的函数实际上 returning 只有 2 行 。如果您的所有调用仅 return(在附近)2 行,只需声明添加 ROWS 2
。也可能更改 VOLATILE
变体的查询计划(即使 STABLE
在这里仍然是正确的选择)。
我有这个运行速度非常快(~12 毫秒)的 PostgreSQL 9.4 查询:
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
但是如果我将它嵌入到一个函数中,查询在所有数据中运行速度非常慢,似乎是 运行 通过每条记录,我错过了什么?,我有 ~1M 的数据,我想要简化我的数据库层,将大型查询存储到函数和视图中。
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk =
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
查询计划是:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
我是这样调用函数的:
SELECT * from get_web_events_by_userid(2)
函数的查询计划:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
编辑:我只是更改了参数,但问题仍然存在。
EDIT2:欧文答案的查询计划:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
通过使此查询动态化并使用 plpgsql,您将获得更好的性能。
CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS $$
BEGIN
RETURN QUERY EXECUTE
'SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = ' || uid ||
'ORDER BY
auth_web_events.id DESC;'
END;
$$ LANGUAGE plpgsql;
user
在重写您的函数时,我发现您在此处添加了列别名:
SELECT
...
auth_user.email <b>AS user</b>,
customers.name AS customer,
.. 一开始不会做任何事情,因为这些别名在函数外部不可见,并且在函数内部没有被引用。所以他们会被忽略。出于文档目的,最好使用评论。
但它也会使您的查询 无效 ,因为 user
完全是 reserved word 并且不能用作列别名,除非用双引号引起来。
奇怪的是,在我的测试中,该函数似乎可以使用无效的别名。可能是因为被忽略了(?)。但我不确定这不会有副作用。
你的函数重写了(否则等价):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE (
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
)
LANGUAGE sql STABLE AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = -- reverted the logic here
ORDER BY w.id DESC
$func$;
显然,STABLE
关键字改变了结果。 Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. 此外,标准 EXPLAIN
不显示 内部 函数的查询计划。您可以为此使用附加模块 auto-explain:
您的数据分布非常奇怪:
auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record
由于您没有另外定义,该函数假设 1000 行 的估计值被 return 编辑。但是您的函数实际上 returning 只有 2 行 。如果您的所有调用仅 return(在附近)2 行,只需声明添加 ROWS 2
。也可能更改 VOLATILE
变体的查询计划(即使 STABLE
在这里仍然是正确的选择)。