分析 PostgreSQL 执行计划
Analysing the PostgreSQL execution Plan
我试图了解 PostgreSQL 如何使用索引以及它们的执行计划如何。
我执行了以下查询,
EXPLAIN(VERBOSE,ANALYZE)
SELECT SUM("Orders"."order_subtotal_net_after_discount") AS "Total sum of Order Subtotal Net After Discount"
FROM "ol"."orders_test" AS "Orders"
LEFT OUTER JOIN "ol"."ga_transactions" AS "Ga Transactions" ON "Ga Transactions"."transaction_id" = "Orders"."order_number"
WHERE ( to_char(created_at_order,'YYYY-MM-DD')=to_char(now(),'YYYY-MM-DD')
AND "Orders"."order_state_2"<>'canceled'
AND "Orders"."order_state_2" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'canceled'
AND "Orders"."order_state_1" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'closed'
AND "Orders"."order_state_2" <> 'closed'
)
得到下面的执行计划,我发现下面的过滤器发生了顺序扫描
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))"
整个查询计划
Aggregate (cost=157385.36..157385.37 rows=1 width=32) (actual time=840.221..840.221 rows=1 loops=1)
Output: sum("Orders".order_subtotal_net_after_discount)
-> Nested Loop Left Join (cost=0.42..157378.56 rows=2717 width=6) (actual time=0.017..840.095 rows=470 loops=1)
Output: "Orders".order_subtotal_net_after_discount
-> Seq Scan on ol.orders_test "Orders" (cost=0.00..148623.91 rows=2717 width=16) (actual time=0.009..838.144 rows=470 loops=1)
Output: "Orders".rno, "Orders".order_id, "Orders".order_number, "Orders".invoice_id, "Orders".invoice_number, "Orders".store_id, "Orders".customer_id, "Orders".real_customer_id, "Orders".shipping_address_id, "Orders".order_state_1, "Orders".order_state_2, "Orders".invoice_state, "Orders".shipping_street, "Orders".shipping_city, "Orders".shipping_postcode, "Orders".shipping_country_id, "Orders".shipping_description, "Orders".coupon_code, "Orders".first_order_of_customer, "Orders".payment_method, "Orders".order_subtotal_net, "Orders".order_subtotal_net_after_discount, "Orders".order_subtotal, "Orders".order_shipment, "Orders".order_shipping_tax, "Orders".order_discount, "Orders".order_tax_total, "Orders".order_grand_total, "Orders".order_total_paid, "Orders".order_refunded_total, "Orders".order_total_open, "Orders".invoice_subtotal_net, "Orders".invoice_subtotal_net_after_discount, "Orders".invoice_subtotal, "Orders".invoice_shipment, "Orders".invoice_shipping_tax, "Orders".invoice_discount, "Orders".invoice_tax_total, "Orders".invoice_grand_total, "Orders".invoice_refunded_total, "Orders".created_at_order, "Orders".created_at_invoice, "Orders".updated_at_invoice, "Orders".customer_email, "Orders".row_number, "Orders".nthorder, "Orders".time_since_last_order, "Orders".time_since_first_order
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))
Rows Removed by Filter: 654356
-> Index Only Scan using ga_transactions_transaction_id_idx on ol.ga_transactions "Ga Transactions" (cost=0.42..3.21 rows=1 width=10) (actual time=0.004..0.004 rows=0 loops=470)
Output: "Ga Transactions".transaction_id
Index Cond: ("Ga Transactions".transaction_id = ("Orders".order_number)::text)
Heap Fetches: 0
Planning Time: 0.540 ms
Execution Time: 840.255 ms
我创建了以下索引,查询似乎没有使用这个索引,
是否必须在 INCLUDE 列表中包含所有输出列?
如果我添加更多索引,将会增加更新时间。那么有没有什么最好的方法来识别 table 的有效索引?
CREATE INDEX "IX_states"
ON ol.orders_test USING btree
(order_state_2 ASC NULLS LAST, order_state_1 ASC NULLS LAST, created_at_order ASC NULLS LAST)
INCLUDE(rno, order_id, order_number, invoice_id, invoice_number, store_id, customer_id, real_customer_id, shipping_address_id, invoice_state, shipping_street, shipping_city, shipping_postcode, shipping_country_id, shipping_description, coupon_code, first_order_of_customer, payment_method, order_subtotal_net_after_discount, created_at_invoice, customer_email, nthorder, time_since_last_order, time_since_first_order)
WITH (FILLFACTOR=90)
TABLESPACE pg_default;
否,不需要在索引中包含所有列。索引用于快速查找 几行。没有灵丹妙药可以让一切突然变快。如果您在索引中包含所有 table 列,则扫描索引将花费与执行 Seq Scan
相同的时间
你肯定想要索引 created_at_order
。
create index on orders_test (created_at_order);
但是您的表达式 to_char(created_at_order,'YYYY-MM-DD')
不会使用该索引。最好的办法是改变你的状态:
where created_at_order >= current_date and created_at_order < current_date + 1
那就要利用上面的索引。如果你不想使用这样的表达式,那么你需要一个专门的表达式索引。我不会将其转换为文本值,而是索引日期值:
create index on orders_test ( (created_at_order::date) );
索引日期值还有一个好处就是索引更小。一个 date
只占用 4 个字节,而字符串 '2019-10-15'
占用十个字节的存储空间。
然后以下将使用该索引:
where created_at_order::date = current_date
如果对状态列的额外限制进一步减少了行,您可以为此使用过滤索引:
create index on orders_test ( (created_at_order::date) )
WHERE order_state_2 <>'canceled'
AND order_state_2 <> 'pending_payment'
AND order_state_1 <> 'canceled'
AND order_state_1 <> 'pending_payment'
AND order_state_1 <> 'closed'
AND order_state_2 <> 'closed';
这将使索引更小并更快地在索引中查找行。但是,该索引将不再用于不包含这些条件的查询。
如果您不一直使用所有这些条件,请将它们减少到您经常使用的条件。
如果这些附加条件并没有真正减少行数,因为 "created today" 上的条件已经非常严格,那么您可以将它们排除在外。
您使用 created_at_order
作为时间戳创建了索引,但在 WHERE
中您正在使用函数转换和比较字符串。
我建议您将 WHERE
条款编辑为如下内容:
WHERE created_at_order BETWEEN now() AND TIMESTAMP 'today'
我试图了解 PostgreSQL 如何使用索引以及它们的执行计划如何。
我执行了以下查询,
EXPLAIN(VERBOSE,ANALYZE)
SELECT SUM("Orders"."order_subtotal_net_after_discount") AS "Total sum of Order Subtotal Net After Discount"
FROM "ol"."orders_test" AS "Orders"
LEFT OUTER JOIN "ol"."ga_transactions" AS "Ga Transactions" ON "Ga Transactions"."transaction_id" = "Orders"."order_number"
WHERE ( to_char(created_at_order,'YYYY-MM-DD')=to_char(now(),'YYYY-MM-DD')
AND "Orders"."order_state_2"<>'canceled'
AND "Orders"."order_state_2" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'canceled'
AND "Orders"."order_state_1" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'closed'
AND "Orders"."order_state_2" <> 'closed'
)
得到下面的执行计划,我发现下面的过滤器发生了顺序扫描
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))"
整个查询计划
Aggregate (cost=157385.36..157385.37 rows=1 width=32) (actual time=840.221..840.221 rows=1 loops=1)
Output: sum("Orders".order_subtotal_net_after_discount)
-> Nested Loop Left Join (cost=0.42..157378.56 rows=2717 width=6) (actual time=0.017..840.095 rows=470 loops=1)
Output: "Orders".order_subtotal_net_after_discount
-> Seq Scan on ol.orders_test "Orders" (cost=0.00..148623.91 rows=2717 width=16) (actual time=0.009..838.144 rows=470 loops=1)
Output: "Orders".rno, "Orders".order_id, "Orders".order_number, "Orders".invoice_id, "Orders".invoice_number, "Orders".store_id, "Orders".customer_id, "Orders".real_customer_id, "Orders".shipping_address_id, "Orders".order_state_1, "Orders".order_state_2, "Orders".invoice_state, "Orders".shipping_street, "Orders".shipping_city, "Orders".shipping_postcode, "Orders".shipping_country_id, "Orders".shipping_description, "Orders".coupon_code, "Orders".first_order_of_customer, "Orders".payment_method, "Orders".order_subtotal_net, "Orders".order_subtotal_net_after_discount, "Orders".order_subtotal, "Orders".order_shipment, "Orders".order_shipping_tax, "Orders".order_discount, "Orders".order_tax_total, "Orders".order_grand_total, "Orders".order_total_paid, "Orders".order_refunded_total, "Orders".order_total_open, "Orders".invoice_subtotal_net, "Orders".invoice_subtotal_net_after_discount, "Orders".invoice_subtotal, "Orders".invoice_shipment, "Orders".invoice_shipping_tax, "Orders".invoice_discount, "Orders".invoice_tax_total, "Orders".invoice_grand_total, "Orders".invoice_refunded_total, "Orders".created_at_order, "Orders".created_at_invoice, "Orders".updated_at_invoice, "Orders".customer_email, "Orders".row_number, "Orders".nthorder, "Orders".time_since_last_order, "Orders".time_since_first_order
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))
Rows Removed by Filter: 654356
-> Index Only Scan using ga_transactions_transaction_id_idx on ol.ga_transactions "Ga Transactions" (cost=0.42..3.21 rows=1 width=10) (actual time=0.004..0.004 rows=0 loops=470)
Output: "Ga Transactions".transaction_id
Index Cond: ("Ga Transactions".transaction_id = ("Orders".order_number)::text)
Heap Fetches: 0
Planning Time: 0.540 ms
Execution Time: 840.255 ms
我创建了以下索引,查询似乎没有使用这个索引,
是否必须在 INCLUDE 列表中包含所有输出列?
如果我添加更多索引,将会增加更新时间。那么有没有什么最好的方法来识别 table 的有效索引?
CREATE INDEX "IX_states"
ON ol.orders_test USING btree
(order_state_2 ASC NULLS LAST, order_state_1 ASC NULLS LAST, created_at_order ASC NULLS LAST)
INCLUDE(rno, order_id, order_number, invoice_id, invoice_number, store_id, customer_id, real_customer_id, shipping_address_id, invoice_state, shipping_street, shipping_city, shipping_postcode, shipping_country_id, shipping_description, coupon_code, first_order_of_customer, payment_method, order_subtotal_net_after_discount, created_at_invoice, customer_email, nthorder, time_since_last_order, time_since_first_order)
WITH (FILLFACTOR=90)
TABLESPACE pg_default;
否,不需要在索引中包含所有列。索引用于快速查找 几行。没有灵丹妙药可以让一切突然变快。如果您在索引中包含所有 table 列,则扫描索引将花费与执行 Seq Scan
相同的时间你肯定想要索引 created_at_order
。
create index on orders_test (created_at_order);
但是您的表达式 to_char(created_at_order,'YYYY-MM-DD')
不会使用该索引。最好的办法是改变你的状态:
where created_at_order >= current_date and created_at_order < current_date + 1
那就要利用上面的索引。如果你不想使用这样的表达式,那么你需要一个专门的表达式索引。我不会将其转换为文本值,而是索引日期值:
create index on orders_test ( (created_at_order::date) );
索引日期值还有一个好处就是索引更小。一个 date
只占用 4 个字节,而字符串 '2019-10-15'
占用十个字节的存储空间。
然后以下将使用该索引:
where created_at_order::date = current_date
如果对状态列的额外限制进一步减少了行,您可以为此使用过滤索引:
create index on orders_test ( (created_at_order::date) )
WHERE order_state_2 <>'canceled'
AND order_state_2 <> 'pending_payment'
AND order_state_1 <> 'canceled'
AND order_state_1 <> 'pending_payment'
AND order_state_1 <> 'closed'
AND order_state_2 <> 'closed';
这将使索引更小并更快地在索引中查找行。但是,该索引将不再用于不包含这些条件的查询。
如果您不一直使用所有这些条件,请将它们减少到您经常使用的条件。
如果这些附加条件并没有真正减少行数,因为 "created today" 上的条件已经非常严格,那么您可以将它们排除在外。
您使用 created_at_order
作为时间戳创建了索引,但在 WHERE
中您正在使用函数转换和比较字符串。
我建议您将 WHERE
条款编辑为如下内容:
WHERE created_at_order BETWEEN now() AND TIMESTAMP 'today'