如何找出导致此插入失败的错误数据

How to find out bad data causing this insert to fail

我有一个包含 8000 万条记录的数据库 (Postgres 9.3.5),下面的 insert 查询失败:

ERROR:  invalid input syntax for integer: ""

INSERT INTO DISCOGS.TRACK_DURATION
     SELECT
        track_id,
        duration,
        hours_as_seconds + minutes_as_seconds + seconds as total_seconds
    FROM (
            select
            track_id,
            duration,
            CASE
                WHEN duration like '%:%:%' THEN (split_part(duration, ':', 1))::bigint * 60 * 60
                ELSE 0
            END  as hours_as_seconds,
            CASE
                WHEN duration like '%:%:%' THEN (split_part(duration, ':', 2))::bigint * 60
                WHEN duration like '%:%'  THEN  (split_part(duration, ':', 1))::bigint * 60
                ELSE 0
            END as minutes_as_seconds,
            CASE
                WHEN duration like '%:%:%' THEN (split_part(duration, ':', 3))::bigint
                WHEN duration like '%:%'   THEN (split_part(duration, ':', 2))::bigint
                ELSE 0
            END as seconds
            from discogs.track t1
            where release_id < 10000000
            and t1.duration!='' and t1.duration is not null
            and t1.position!=''
    ) as s1

我可以使用 where release_id 来限制检查记录的数量,并且使用较低的值它很好,所以它是错误的数据,但是有这么多记录我如何找到问题数据。请注意,我已经过滤掉持续时间为空字符串的值,并且我还发现了一些包含错误数据的记录(例如 %%%%),我已经更改但它仍然失败。

我会使用正则表达式搜索格式错误的持续时间,如:

create table duration (
  d varchar(20)
);

insert into duration (d) values ('12:34:56');
insert into duration (d) values ('34:56');
insert into duration (d) values ('15::'); -- bad one
insert into duration (d) values (':34:56'); -- bad one
insert into duration (d) values (':34:'); -- bad one
insert into duration (d) values ('12:34:'); -- bad one
insert into duration (d) values ('34:'); -- bad one
insert into duration (d) values (':56'); -- bad one

select *
  from duration 
  where d not similar to '([0-9]+:)?[0-9]+:[0-9]+'

结果:

d                     
------
15::                  
:34:56                
:34:                  
12:34:                
34:                   
:56 

在您的情况下,查询应如下所示:

select track_id, duration 
  from discogs.track
  where duration not similar to '([0-9]+:)?[0-9]+:[0-9]+';