如何使用多个 CTE 加速查询

How to speed up query with multiple CTEs

我有多个 CTE 的以下查询。如果没有 departure_dates 和 arrival_dates CTE,查询将在大约 12 秒内完成。当我添加 departure_dates 和 arrival_dates CTE 时,查询花费的时间太长以至于我不得不取消它。我不能运行解释同样原因的analyze bc。使用 运行ning 解释,它表示成本最高的节点是末尾的连接 a.arrival_date = r.arrival_date。从我在解释中看到的内容来看,它使用的是我拥有的日期 CTE 索引。所以我的问题是,添加这两个会导致速度减慢这么多,有什么办法可以 optimize/fix 它吗?

WITH
  relationships AS (
    SELECT rt.from_location_code,
           rt.to_location_code,
           rt.departure_datetime                                                                              AS departure_date,
           rt.arrival_datetime                                                                                AS arrival_date,
           rt.ticketno,
           rt.leg_number,
           rt.leg_direction,
           rt.service_direction,
           rt.booking_reference,
           fj.transaction_id,
           fj.ticket_id,
           fj.ticket_number,
           rt.batch_id
      FROM sch1.booked_travel rt
        LEFT JOIN (
        SELECT DISTINCT t.transaction_id, sdt.ticket_number, sdt.ticket_id
          FROM sch2.ticket_transactions t
            JOIN sch2.tickets sdt USING (ticket_id)
            JOIN sch1.reserved_travels ert ON ert.ticketno = sdt.ticket_number
                  ) fj ON fj.ticket_number = rt.ticketno
      WHERE batch_id = 234
  ),
  xaction AS (
    SELECT DISTINCT t.transaction_id, sdt.ticket_number, sdt.ticket_id
      FROM sch2.ticket_transactions t
        JOIN sch2.tickets sdt USING (ticket_id)
        JOIN relationships ert ON ert.ticketno = sdt.ticket_number
      WHERE ert.batch_id = 234
  ),
  from_location AS (
    SELECT DISTINCT l.location_id AS from_location_id, rt.from_location_code
      FROM sch2.locations l
        JOIN relationships rt ON rt.from_location_code = l.code
      WHERE rt.batch_id = 234
  ),
  to_location AS (
    SELECT DISTINCT d.location_id AS to_location_id, rt.to_location_code
      FROM sch2.locations d
        JOIN relationships rt ON rt.to_location_code = d.code
      WHERE rt.batch_id = 234
  ),
  departure_dates as (
    select distinct on (d.date_id, d.the_date) d.date_id as departure_date_id, d.the_date as departure_date
      from sch2.dates d
        join relationships r on r.departure_date = d.the_date
      where r.batch_id = 234
  ),
  arrival_dates as (
    select distinct on (d.date_id, d.the_date) d.date_id as arrival_date_id, d.the_date as arrival_date
      from sch2.dates d
        join relationships r on r.arrival_date = d.the_date
      where r.batch_id = 234
  )
INSERT
  INTO sch2.travel_legs (transaction_id, ticket_id, leg_number, leg_direction,
                              from_location_id, to_location_id, service_direction,
                              departure_date_id, departure_date, arrival_date, arrival_date_id, booking_reference)
SELECT r.transaction_id,
       r.ticket_id,
       r.leg_number,
       r.leg_direction,
       f.from_location_id,
       t.to_location_id,
       r.service_direction,
       d.departure_date_id,
       r.departure_date,
       r.arrival_date,
       a.arrival_date_id,
       r.booking_reference
  FROM relationships r
    LEFT JOIN xaction USING (transaction_id)
    LEFT JOIN from_location f ON f.from_location_code = r.from_location_code
    LEFT JOIN to_location t ON t.to_location_code = r.to_location_code
    left join departure_dates d on r.departure_date = d.departure_date
    left join arrival_dates a on a.arrival_date = r.arrival_date
  RETURNING *;

计划:

"Insert on sch2.travle_legs  (cost=6500820.62..19312607.17 rows=425901878 width=678)"
"  Output: travle_legs.journey_leg_id, travle_legs.transaction_id, travle_legs.customer_id, travle_legs.ticket_id, travle_legs.brand_id, travle_legs.leg_number, travle_legs.leg_direction, travle_legs.from_location_id, travle_legs.to_location_id, travle_legs.service_id, travle_legs.service_direction, travle_legs.departure_date, travle_legs.arrival_date, travle_legs.departure_date_id, travle_legs.arrival_date_id, travle_legs.journey_type, travle_legs.flight_id, travle_legs.booking_reference, travle_legs.scheduled_depature_date, travle_legs.scheduled_departure_date_id, travle_legs.actual_departure_date, travle_legs.actual_departure_date_id, travle_legs.scheduled_arrival_date, travle_legs.scheduled_arrival_date_id, travle_legs.actual_arrival_date, travle_legs.actual_arrival_date_id, travle_legs.departure_offset_minutes, travle_legs.departure_offset_time, travle_legs.departure_late, travle_legs.arrival_offset_minutes, travle_legs.arrival_offset_time, travle_legs.arrival_late"
"  CTE relationships"
"    ->  Hash Right Join  (cost=17457.33..769694.82 rows=73016 width=90)"
"          Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, COALESCE([=11=], '-1'::integer), (SubPlan 2), rt.ticketno, rt.leg_number, rt.leg_direction, COALESCE(, '-1'::integer), rt.service_direction, , rt.booking_reference, fj.transaction_id, fj.ticket_id, fj.ticket_number, rt.sch1_batch_file_id"
"          Hash Cond: ((fj.ticket_number)::text = (rt.ticketno)::text)"
"          InitPlan 1 (returns [=11=])"
"            ->  Limit  (cost=1.02..59.51 rows=1 width=4)"
"                  Output: c.service_id"
"                  ->  Hash Join  (cost=1.02..11112.93 rows=190 width=4)"
"                        Output: c.service_id"
"                        Inner Unique: true"
"                        Hash Cond: ((e.service_number)::text = (c.code)::text)"
"                        ->  Seq Scan on sch1.booked_travels e  (cost=0.00..10989.21 rows=45937 width=3)"
"                              Output: e.id, e.sale_id, e.ticketno, e.transaction_date, e.leg_number, e.leg_direction, e.from_location_code, e.to_location_code, e.company_brand_code, e.service_number, e.service_direction, e.departure_datetime, e.arrival_datetime, e.flight_code, e.booking_reference, e.sch1_batch_file_id, e.imported, e.created_at, e.updated_at, e.customer_id"
"                              Filter: (e.sch1_batch_file_id = 234)"
"                        ->  Hash  (cost=1.01..1.01 rows=1 width=72)"
"                              Output: c.service_id, c.code"
"                              ->  Seq Scan on sch2.services c  (cost=0.00..1.01 rows=1 width=72)"
"                                    Output: c.service_id, c.code"
"          InitPlan 3 (returns )"
"            ->  Function Scan on public.find_brand  (cost=0.25..10.25 rows=1000 width=4)"
"                  Output: find_brand.brand_id"
"                  Function Call: find_brand('sch1.booked_travels'::regclass, 152, 'company_brand_code'::text)"
"          InitPlan 4 (returns )"
"            ->  Limit  (cost=1.02..219.88 rows=1 width=4)"
"                  Output: c_1.flight_id"
"                  ->  Hash Join  (cost=1.02..11163.01 rows=51 width=4)"
"                        Output: c_1.flight_id"
"                        Hash Cond: ((e_1.flight_code)::text = (c_1.flight_code)::text)"
"                        ->  Seq Scan on sch1.booked_travels e_1  (cost=0.00..10989.21 rows=45937 width=3)"
"                              Output: e_1.id, e_1.sale_id, e_1.ticketno, e_1.transaction_date, e_1.leg_number, e_1.leg_direction, e_1.from_location_code, e_1.to_location_code, e_1.company_brand_code, e_1.service_number, e_1.service_direction, e_1.departure_datetime, e_1.arrival_datetime, e_1.flight_code, e_1.booking_reference, e_1.sch1_batch_file_id, e_1.imported, e_1.created_at, e_1.updated_at, e_1.customer_id"
"                              Filter: (e_1.sch1_batch_file_id = 234)"
"                        ->  Hash  (cost=1.01..1.01 rows=1 width=520)"
"                              Output: c_1.flight_id, c_1.flight_code"
"                              ->  Seq Scan on sch2.flights c_1  (cost=0.00..1.01 rows=1 width=520)"
"                                    Output: c_1.flight_id, c_1.flight_code"
"          ->  Subquery Scan on fj  (cost=5110.26..6047.10 rows=46842 width=17)"
"                Output: fj.transaction_id, fj.ticket_id, fj.ticket_number"
"                ->  HashAggregate  (cost=5110.26..5578.68 rows=46842 width=17)"
"                      Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
"                      Group Key: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
"                      ->  Hash Join  (cost=2505.21..4758.94 rows=46842 width=17)"
"                            Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
"                            Hash Cond: ((ert.ticketno)::text = (sdt.ticket_number)::text)"
"                            ->  Index Only Scan using idx_booked_travels_batch_id_ticketno on sch1.booked_travels ert  (cost=0.41..1613.47 rows=45937 width=9)"
"                                  Output: ert.ticketno, ert.sch1_batch_file_id"
"                            ->  Hash  (cost=2059.58..2059.58 rows=35617 width=17)"
"                                  Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
"                                  ->  Hash Join  (cost=1077.90..2059.58 rows=35617 width=17)"
"                                        Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
"                                        Inner Unique: true"
"                                        Hash Cond: (t_1.ticket_id = sdt.ticket_id)"
"                                        ->  Seq Scan on sch2.fact_ticket_transactions t_1  (cost=0.00..888.17 rows=35617 width=8)"
"                                              Output: t_1.transaction_id, t_1.customer_id, t_1.ticket_id, t_1.agent_id, t_1.sale_date, t_1.sale_date_id, t_1.sales_channel, t_1.transaction_type, t_1.ticket_type, t_1.ticket_queue, t_1.fare_type, t_1.fare_value, t_1.adults, t_1.children, t_1.concessionary, t_1.young_person_coach_card, t_1.eurolines_young_person, t_1.hm_forces, t_1.infants, t_1.eurolines_seniors, t_1.over_sixties, t_1.coach_card_id, t_1.coach_card_consumer_type_id, t_1.account_id, t_1.distribution_type, t_1.ticket_email_id"
"                                        ->  Hash  (cost=641.29..641.29 rows=34929 width=13)"
"                                              Output: sdt.ticket_number, sdt.ticket_id"
"                                              ->  Seq Scan on sch2.tickets sdt  (cost=0.00..641.29 rows=34929 width=13)"
"                                                    Output: sdt.ticket_number, sdt.ticket_id"
"          ->  Hash  (cost=10989.21..10989.21 rows=45937 width=61)"
"                Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, rt.customer_id, rt.ticketno, rt.leg_number, rt.leg_direction, rt.service_direction, rt.booking_reference, rt.sch1_batch_file_id"
"                ->  Seq Scan on sch1.booked_travels rt  (cost=0.00..10989.21 rows=45937 width=61)"
"                      Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, rt.customer_id, rt.ticketno, rt.leg_number, rt.leg_direction, rt.service_direction, rt.booking_reference, rt.sch1_batch_file_id"
"                      Filter: (rt.sch1_batch_file_id = 234)"
"          SubPlan 2"
"            ->  Function Scan on public.find_customer  (cost=0.25..10.25 rows=1000 width=4)"
"                  Output: rt.customer_id"
"                  Function Call: find_customer('sch1.booked_travels'::regclass, 152)"
"  CTE xaction"
"    ->  Unique  (cost=2879.91..2883.63 rows=372 width=17)"
"          Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
"          ->  Sort  (cost=2879.91..2880.84 rows=372 width=17)"
"                Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
"                Sort Key: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
"                ->  Nested Loop  (cost=1078.19..2864.03 rows=372 width=17)"
"                      Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
"                      ->  Hash Join  (cost=1077.90..2721.72 rows=365 width=13)"
"                            Output: sdt_1.ticket_number, sdt_1.ticket_id"
"                            Inner Unique: true"
"                            Hash Cond: ((ert_1.ticketno)::text = (sdt_1.ticket_number)::text)"
"                            ->  CTE Scan on relationships ert_1  (cost=0.00..1642.86 rows=365 width=32)"
"                                  Output: ert_1.from_location_code, ert_1.to_location_code, ert_1.departure_date, ert_1.arrival_date, ert_1.service_id, ert_1.customer_id, ert_1.ticketno, ert_1.leg_number, ert_1.leg_direction, ert_1.brand_id, ert_1.service_direction, ert_1.flight_id, ert_1.booking_reference, ert_1.transaction_id, ert_1.ticket_id, ert_1.ticket_number, ert_1.sch1_batch_file_id"
"                                  Filter: (ert_1.sch1_batch_file_id = 234)"
"                            ->  Hash  (cost=641.29..641.29 rows=34929 width=13)"
"                                  Output: sdt_1.ticket_number, sdt_1.ticket_id"
"                                  ->  Seq Scan on sch2.tickets sdt_1  (cost=0.00..641.29 rows=34929 width=13)"
"                                        Output: sdt_1.ticket_number, sdt_1.ticket_id"
"                      ->  Index Only Scan using idx_ticket_xaction_ticket_id_transaction_id on sch2.fact_ticket_transactions t_2  (cost=0.29..0.38 rows=1 width=8)"
"                            Output: t_2.ticket_id, t_2.transaction_id"
"                            Index Cond: (t_2.ticket_id = sdt_1.ticket_id)"
"  CTE from_location"
"    ->  Unique  (cost=1676.67..1679.41 rows=365 width=36)"
"          Output: l.location_id, rt_1.from_location_code"
"          ->  Sort  (cost=1676.67..1677.58 rows=365 width=36)"
"                Output: l.location_id, rt_1.from_location_code"
"                Sort Key: l.location_id, rt_1.from_location_code"
"                ->  Hash Join  (cost=17.31..1661.14 rows=365 width=36)"
"                      Output: l.location_id, rt_1.from_location_code"
"                      Inner Unique: true"
"                      Hash Cond: ((rt_1.from_location_code)::text = (l.code)::text)"
"                      ->  CTE Scan on relationships rt_1  (cost=0.00..1642.86 rows=365 width=32)"
"                            Output: rt_1.from_location_code, rt_1.to_location_code, rt_1.departure_date, rt_1.arrival_date, rt_1.service_id, rt_1.customer_id, rt_1.ticketno, rt_1.leg_number, rt_1.leg_direction, rt_1.brand_id, rt_1.service_direction, rt_1.flight_id, rt_1.booking_reference, rt_1.transaction_id, rt_1.ticket_id, rt_1.ticket_number, rt_1.sch1_batch_file_id"
"                            Filter: (rt_1.sch1_batch_file_id = 234)"
"                      ->  Hash  (cost=10.47..10.47 rows=547 width=10)"
"                            Output: l.location_id, l.code"
"                            ->  Seq Scan on sch2.locations l  (cost=0.00..10.47 rows=547 width=10)"
"                                  Output: l.location_id, l.code"
"  CTE to_location"
"    ->  Unique  (cost=1676.67..1679.41 rows=365 width=36)"
"          Output: d_1.location_id, rt_2.to_location_code"
"          ->  Sort  (cost=1676.67..1677.58 rows=365 width=36)"
"                Output: d_1.location_id, rt_2.to_location_code"
"                Sort Key: d_1.location_id, rt_2.to_location_code"
"                ->  Hash Join  (cost=17.31..1661.14 rows=365 width=36)"
"                      Output: d_1.location_id, rt_2.to_location_code"
"                      Inner Unique: true"
"                      Hash Cond: ((rt_2.to_location_code)::text = (d_1.code)::text)"
"                      ->  CTE Scan on relationships rt_2  (cost=0.00..1642.86 rows=365 width=32)"
"                            Output: rt_2.from_location_code, rt_2.to_location_code, rt_2.departure_date, rt_2.arrival_date, rt_2.service_id, rt_2.customer_id, rt_2.ticketno, rt_2.leg_number, rt_2.leg_direction, rt_2.brand_id, rt_2.service_direction, rt_2.flight_id, rt_2.booking_reference, rt_2.transaction_id, rt_2.ticket_id, rt_2.ticket_number, rt_2.sch1_batch_file_id"
"                            Filter: (rt_2.sch1_batch_file_id = 234)"
"                      ->  Hash  (cost=10.47..10.47 rows=547 width=10)"
"                            Output: d_1.location_id, d_1.code"
"                            ->  Seq Scan on sch2.locations d_1  (cost=0.00..10.47 rows=547 width=10)"
"                                  Output: d_1.location_id, d_1.code"
"  CTE departure_dates"
"    ->  Unique  (cost=8103.16..8149.19 rows=6137 width=12)"
"          Output: d_2.date_id, d_2.the_date"
"          ->  Sort  (cost=8103.16..8118.51 rows=6137 width=12)"
"                Output: d_2.date_id, d_2.the_date"
"                Sort Key: d_2.date_id, d_2.the_date"
"                ->  Nested Loop  (cost=0.42..7717.04 rows=6137 width=12)"
"                      Output: d_2.date_id, d_2.the_date"
"                      ->  CTE Scan on relationships r_1  (cost=0.00..1642.86 rows=365 width=8)"
"                            Output: r_1.from_location_code, r_1.to_location_code, r_1.departure_date, r_1.arrival_date, r_1.service_id, r_1.customer_id, r_1.ticketno, r_1.leg_number, r_1.leg_direction, r_1.brand_id, r_1.service_direction, r_1.flight_id, r_1.booking_reference, r_1.transaction_id, r_1.ticket_id, r_1.ticket_number, r_1.sch1_batch_file_id"
"                            Filter: (r_1.sch1_batch_file_id = 234)"
"                      ->  Index Scan using idx_lkup_dates_date on sch2.lkup_dates d_2  (cost=0.42..16.47 rows=17 width=12)"
"                            Output: d_2.date_id, d_2.the_date, d_2.date_iso, d_2.date_text, d_2.date_abbreviation, d_2.day_of_week_number, d_2.day_of_week_name, d_2.day_of_week_short_name, d_2.first_date_of_week, d_2.last_date_of_week, d_2.day_of_month, d_2.day_of_month_number, d_2.month_name, d_2.month_name_with_year, d_2.month_short_name, d_2.month_short_name_with_year, d_2.day_of_month_name, d_2.first_date_of_month, d_2.last_date_of_month, d_2.calendar_week_number, d_2.calendar_week_name, d_2.calendar_week_name_with_year, d_2.calendar_week_short_name, d_2.calendar_week_short_name_with_year, d_2.calendar_week_yy_wk, d_2.calendar_month_number, d_2.calendar_month_yy_mm, d_2.calendar_year_month"
"                            Index Cond: (d_2.the_date = r_1.departure_date)"
"  CTE arrival_dates"
"    ->  Unique  (cost=8103.16..8149.19 rows=6137 width=12)"
"          Output: d_3.date_id, d_3.the_date"
"          ->  Sort  (cost=8103.16..8118.51 rows=6137 width=12)"
"                Output: d_3.date_id, d_3.the_date"
"                Sort Key: d_3.date_id, d_3.the_date"
"                ->  Nested Loop  (cost=0.42..7717.04 rows=6137 width=12)"
"                      Output: d_3.date_id, d_3.the_date"
"                      ->  CTE Scan on relationships r_2  (cost=0.00..1642.86 rows=365 width=8)"
"                            Output: r_2.from_location_code, r_2.to_location_code, r_2.departure_date, r_2.arrival_date, r_2.service_id, r_2.customer_id, r_2.ticketno, r_2.leg_number, r_2.leg_direction, r_2.brand_id, r_2.service_direction, r_2.flight_id, r_2.booking_reference, r_2.transaction_id, r_2.ticket_id, r_2.ticket_number, r_2.sch1_batch_file_id"
"                            Filter: (r_2.sch1_batch_file_id = 234)"
"                      ->  Index Scan using idx_lkup_dates_date on sch2.lkup_dates d_3  (cost=0.42..16.47 rows=17 width=12)"
"                            Output: d_3.date_id, d_3.the_date, d_3.date_iso, d_3.date_text, d_3.date_abbreviation, d_3.day_of_week_number, d_3.day_of_week_name, d_3.day_of_week_short_name, d_3.first_date_of_week, d_3.last_date_of_week, d_3.day_of_month, d_3.day_of_month_number, d_3.month_name, d_3.month_name_with_year, d_3.month_short_name, d_3.month_short_name_with_year, d_3.day_of_month_name, d_3.first_date_of_month, d_3.last_date_of_month, d_3.calendar_week_number, d_3.calendar_week_name, d_3.calendar_week_name_with_year, d_3.calendar_week_short_name, d_3.calendar_week_short_name_with_year, d_3.calendar_week_yy_wk, d_3.calendar_month_number, d_3.calendar_month_yy_mm, d_3.calendar_year_month"
"                            Index Cond: (d_3.the_date = r_2.arrival_date)"
"  ->  Merge Right Join  (cost=5708584.97..18520371.52 rows=425901878 width=678)"
"        Output: nextval('sch2.travle_legs_journey_leg_id_seq'::regclass), r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, f.from_location_id, t.to_location_id, r.service_id, r.service_direction, r.departure_date, r.arrival_date, d.departure_date_id, a.arrival_date_id, NULL::integer, r.flight_id, r.booking_reference, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::integer, NULL::integer, NULL::boolean"
"        Merge Cond: (a.arrival_date = r.arrival_date)"
"        ->  Sort  (cost=508.86..524.20 rows=6137 width=12)"
"              Output: a.arrival_date_id, a.arrival_date"
"              Sort Key: a.arrival_date"
"              ->  CTE Scan on arrival_dates a  (cost=0.00..122.74 rows=6137 width=12)"
"                    Output: a.arrival_date_id, a.arrival_date"
"        ->  Materialize  (cost=5708076.12..5777475.15 rows=13879807 width=180)"
"              Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
"              ->  Sort  (cost=5708076.12..5742775.63 rows=13879807 width=180)"
"                    Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
"                    Sort Key: r.arrival_date"
"                    ->  Merge Right Join  (cost=151743.89..361102.51 rows=13879807 width=180)"
"                          Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
"                          Merge Cond: (d.departure_date = r.departure_date)"
"                          ->  Sort  (cost=508.86..524.20 rows=6137 width=12)"
"                                Output: d.departure_date_id, d.departure_date"
"                                Sort Key: d.departure_date"
"                                ->  CTE Scan on departure_dates d  (cost=0.00..122.74 rows=6137 width=12)"
"                                      Output: d.departure_date_id, d.departure_date"
"                          ->  Materialize  (cost=151235.03..153496.69 rows=452332 width=176)"
"                                Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
"                                ->  Sort  (cost=151235.03..152365.86 rows=452332 width=176)"
"                                      Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
"                                      Sort Key: r.departure_date"
"                                      ->  Hash Left Join  (cost=35.82..31437.17 rows=452332 width=176)"
"                                            Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
"                                            Hash Cond: (r.transaction_id = xaction.transaction_id)"
"                                            ->  Hash Left Join  (cost=23.73..15046.30 rows=243189 width=176)"
"                                                  Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
"                                                  Hash Cond: ((r.to_location_code)::text = (t.to_location_code)::text)"
"                                                  ->  Hash Left Join  (cost=11.86..6272.98 rows=133254 width=204)"
"                                                        Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, r.to_location_code, f.from_location_id"
"                                                        Hash Cond: ((r.from_location_code)::text = (f.from_location_code)::text)"
"                                                        ->  CTE Scan on relationships r  (cost=0.00..1460.32 rows=73016 width=232)"
"                                                              Output: r.from_location_code, r.to_location_code, r.departure_date, r.arrival_date, r.service_id, r.customer_id, r.ticketno, r.leg_number, r.leg_direction, r.brand_id, r.service_direction, r.flight_id, r.booking_reference, r.transaction_id, r.ticket_id, r.ticket_number, r.sch1_batch_file_id"
"                                                        ->  Hash  (cost=7.30..7.30 rows=365 width=36)"
"                                                              Output: f.from_location_id, f.from_location_code"
"                                                              ->  CTE Scan on from_location f  (cost=0.00..7.30 rows=365 width=36)"
"                                                                    Output: f.from_location_id, f.from_location_code"
"                                                  ->  Hash  (cost=7.30..7.30 rows=365 width=36)"
"                                                        Output: t.to_location_id, t.to_location_code"
"                                                        ->  CTE Scan on to_location t  (cost=0.00..7.30 rows=365 width=36)"
"                                                              Output: t.to_location_id, t.to_location_code"
"                                            ->  Hash  (cost=7.44..7.44 rows=372 width=4)"
"                                                  Output: xaction.transaction_id"
"                                                  ->  CTE Scan on xaction  (cost=0.00..7.44 rows=372 width=4)"
"                                                        Output: xaction.transaction_id"                                                      Output: xaction.transaction_id"

很难说您可以采取哪一种明确的措施来解决运行时问题,但我会建议您尝试做一些事情,看看它们是否会提高查询的性能。

  1. relationships CTE 中,您应该使用 GROUP BY 子句来确保返回的记录是唯一的。因为所有后续 CTE 都加入回 relationships,如果 relationships 包含重复记录,这可能会给以后的 CTE 带来麻烦,具体取决于连接类型查询规划器选择。

  2. 您的查询计划列出了三个嵌套循环连接;以下每个 CTE 中的一个:xactiondeparture_datesarrival_dates.我会检查嵌套循环连接中涉及的 tables/CTEs 实际返回了多少记录(大概是 sch2.lkup_dates 关系、 sch2.fact_ticket_transactions,sch2.tickets)。当查询计划器认为涉及的记录总数很小时,它通常会 select 嵌套循环连接,但有时这种假设非常不正确,这可能导致运行时间呈指数增长,具体取决于嵌套循环结束的位置在执行计划中。

  3. 这更像是一个实践问题,而不是性能提升,但因为您正在过滤 关系 以仅包括包含 batch_id = 234 的记录,您不需要在所有已加入 relationships CTE 的后续 CTE 中包含类似的 WHERE 子句。

如果您发现嵌套循环连接中涉及的 table 非常大,我建议创建一个临时 table 包含连接 table 的输出在执行此查询之前,确保嵌套循环不会出现在查询计划中。


编辑以添加温度 Table 示例:
使用临时 table 背后的一般想法是,它强制在执行查询主体之前构建数据集,这将迫使查询规划器选择更好的计划:

DROP TABLE IF EXISTS temp_departure_dates;

CREATE TEMPORARY TABLE temp_departure_dates as (
    SELECT d.date_id  AS departure_date_id
         , d.the_date AS departure_date
    FROM sch2.dates d
        JOIN sch1.booked_travel b ON b.departure_date = d.the_date
    WHERE b.batch_id = 234
    GROUP BY 1, 2
);

WITH relationships AS (
    -- ...
)

SELECT r.transaction_id,
       r.ticket_id,
       r.leg_number,
       r.leg_direction,
       f.from_location_id,
       t.to_location_id,
       r.service_direction,
       d.departure_date_id,
       r.departure_date,
       r.arrival_date,
       a.arrival_date_id,
       r.booking_reference
  FROM relationships r
    LEFT JOIN xaction USING (transaction_id)
    LEFT JOIN from_location f ON f.from_location_code = r.from_location_code
    LEFT JOIN to_location t ON t.to_location_code = r.to_location_code
    left join temp_departure_dates d on r.departure_date = d.departure_date
    left join arrival_dates a on a.arrival_date = r.arrival_date
  RETURNING *;