带有复合唯一键的 Postgres upsert 只允许单个空值

Postgres upsert with composite unique key to allow only single null value

作为 ETL 的一部分,table continuous_trips 有连续的传入记录流。
新记录被聚合并插入到临时文件中。 table 每 5 分钟调用一次 trips_agg

CREATE TABLE IF NOT EXISTS trips_agg AS (
SELECT start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station,
    AVG(wait_span) AS wait_span,
    AVG(walk_span) AS walk_span,
    AVG(delay_span) AS delay_span,
    SUM(passengers_requests) AS passengers_requests
  FROM continuous_trips
  GROUP BY start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station
)

table trips_agg 在将所有记录插入 table daily_trips 并在下一个周期重新创建后被删除。
daily_tripstrips_agg 具有相同的列。

CREATE TABLE IF NOT EXISTS daily_trips  ( 
             start_time timestamp without time zone NOT NULL,
             station_id text NOT NULL, 
             from_station text NOT NULL,
             to_station text NOT NULL,
             from_terminus text NOT NULL,
             end_terminus text NOT NULL,
             previous_station text,                                
             next_station text,                                
             wait_span interval NOT NULL,
             walk_span interval NOT NULL,
             delay_span interval NOT NULL,
             passengers_requests numeric NOT NULL
             )

注意:列 'previous_station' 和 'next_station' 允许为空。
添加复合唯一键如下:

ALTER TABLE daily_trips ADD CONSTRAINT daily_trips_unique_row UNIQUE   
(start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station);

如果在插入时违反了唯一键,则应更新记录。所以使用upsert策略。

INSERT INTO daily_trips SELECT * FROM trips_agg  
ON CONFLICT (start_time, station_id, from_station, to_station, from_terminus, end_terminus,  
previous_station, next_station) DO UPDATE     
set wait_span = (daily_trips.wait_span + EXCLUDED.wait_span)/2,   
walk_span = (daily_trips.walk_span + EXCLUDED.walk_span)/2 ,  
delay_span = (daily_trips.delay_span + EXCLUDED.delay_span)/2,   
passengers_requests =(daily_trips.passengers_requests + EXCLUDED.passengers_requests); 

当所有列的值都存在时,此设置可以完美运行,但是,当任何可为空的列具有空值时,情况就不一样了。
由于 Postgres 不考虑空值来调用唯一约束,因此只要任何可为空的列具有空值,就会插入一个新行,而不是更新。这导致唯一键进入多行。
为了克服这个问题,在引用 this article.

之后在 table daily_trips 上添加了一个索引
create unique index daily_trips_unique_trip_idx ON daily_trips  
(start_time, station_id, from_station, to_station, from_terminus, end_terminus,   
(previous_station IS NULL), (next_station IS NULL)   
where previous_station IS NULL or fnext_station IS NULL

但是,对于任何可为 null 的列,只能为一行添加 null 值。 对于任何可为空的列具有空值的下一行,更新不会发生,而是会出现以下错误:

ERROR:  duplicate key value violates unique constraint "daily_trips_unique_trip_idx"

需要什么?
当可空列 'previous_station' 或 'next_station'.
中的任一列中存在空值时,应遵守唯一约束并进行更新 感谢任何帮助。

解决方案是将 NULL 转换为其他值,更具体地说是长度为 0 的字符串 ('')。 coalesce 函数在用作 coalesce (column_name, '') 时正是这样做的。创建唯一约束的问题会产生语法错误。所以你不能创建那个约束。然而,有一个变通办法,虽然不是一件容易的事。 Postgres通过唯一索引强制唯一约束,所以直接创建索引即可。

create unique index daily_trips_unique_row on daily_trips 
           ( start_time
           , station_id
           , from_station
           , to_station
           , from_terminus
           , end_terminus
           , coalesce(previous_station , '')
           , coalesce(next_station, '')
           );  

但是,虽然上述 尊重 索引列的空能力,但它不再识别 INSERT ... ON CONFLICT(参见 example here)。您将需要 function/procedure 来处理异常或使用 Select ... if exists then Update else Insert 逻辑。