Vertica 数据库添加了具有相同主键的重复条目
Vertica database added duplicate entry with same primary key
我是 运行 windows
上 Vertica
的 docker 图片。我用这个模式在 vertica
中创建了一个 table(student_id
是 primary key
)
dbadmin@d1f942c8c1e0(*)=> \d testschema.student;
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
------------+---------+------------+-------------+------+---------+----------+-------------+-------------
testschema | student | student_id | int | 8 | | t | t |
testschema | student | name | varchar(20) | 20 | | f | f |
testschema | student | major | varchar(20) | 20 | | f | f |
(3 rows)
student_id
是主键。我正在使用 copy
命令测试从 csv
文件加载数据。
首先我使用了insert
- insert into testschema.student values (1,'Jack','Biology');
然后我在 /home/dbadmin/vertica_test
目录下创建了一个 csv
文件 -
vi student.csv
2,Kate,Sociology
3,Claire,English
4,Jack,Biology
5,Mike,Comp. Sci
然后我运行这个命令
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
我测试了结果
select * from testschema.student - shows 5 rows
select * from students_rejected; - no rows
然后我在 /home/dbadmin/vertica_test directory
处创建了另一个包含错误数据的 csv
文件
vi student_bad.csv
bad_data_type_for_student_id,UnaddedStudent, UnaddedSubject
6,Cassey,Physical Education
我从错误的 csv
文件中添加了数据
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
然后我测试了输出
select * from testschema.student - shows 6 rows <-- only one row got added. all ok
select * from students_rejected; - shows 1 row <-- bad row's entry is here. all ok
看起来不错
然后我在没有拒绝数据选项的情况下再次添加了错误数据
copy testschema.students from '/home/dbadmin/vertica_test/student_bad.csv' delimiter ',' ;
但是现在又添加了学号为6的条目!!
student_id | name | major
------------+--------+--------------------
1 | Jack | Biology
2 | Kate | Sociology
3 | Claire | English
4 | Jack | Biology
5 | Mike | Comp. Sci
6 | Cassey | Physical Education <--
6 | Cassey | Physical Education <--
这个应该不会被拒吧?
如果您使用这种类型的命令创建了 students
:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id)
);
即没有显式关键字ENABLED
,则主键约束被禁用。也就是说,您可以愉快地插入重复项,但如果您稍后想通过主键列加入 students
table,则会 运行 出错。
启用主键约束...
[...]
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
[...]
我想你得到了想要的效果。
整个场景:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
);
INSERT INTO students
SELECT 1,'Jack' ,'Biology'
UNION ALL SELECT 2,'Kate' ,'Sociology'
UNION ALL SELECT 3,'Claire','English'
UNION ALL SELECT 4,'Jack' ,'Biology'
UNION ALL SELECT 5,'Mike' ,'Comp. Sci'
UNION ALL SELECT 6,'Cassey','Physical Education'
;
-- out OUTPUT
-- out --------
-- out 6
COMMIT;
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out vsql:/home/gessnerm/._vfv.sql:4: ERROR 6745:
-- out Duplicate key values: 'student_id=6'
-- out -- violates constraint 'dbadmin.students.pk_students'
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
唯一可靠的检查似乎是 ANALYZE_CONSTRAINTS()
调用 ...
ALTER TABLE students ALTER CONSTRAINT pk_students DISABLED;
-- out Time: First fetch (0 rows): 7.618 ms. All rows formatted: 7.632 ms
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out Time: First fetch (0 rows): 31.790 ms. All rows formatted: 31.791 ms
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
SELECT ANALYZE_CONSTRAINTS('students');
-- out Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-- out -------------+------------+--------------+-----------------+-----------------+---------------
-- out dbadmin | students | student_id | pk_students | PRIMARY | ('6')
-- out (1 row)
我是 运行 windows
上 Vertica
的 docker 图片。我用这个模式在 vertica
中创建了一个 table(student_id
是 primary key
)
dbadmin@d1f942c8c1e0(*)=> \d testschema.student;
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
------------+---------+------------+-------------+------+---------+----------+-------------+-------------
testschema | student | student_id | int | 8 | | t | t |
testschema | student | name | varchar(20) | 20 | | f | f |
testschema | student | major | varchar(20) | 20 | | f | f |
(3 rows)
student_id
是主键。我正在使用 copy
命令测试从 csv
文件加载数据。
首先我使用了insert
- insert into testschema.student values (1,'Jack','Biology');
然后我在 /home/dbadmin/vertica_test
目录下创建了一个 csv
文件 -
vi student.csv
2,Kate,Sociology
3,Claire,English
4,Jack,Biology
5,Mike,Comp. Sci
然后我运行这个命令
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
我测试了结果
select * from testschema.student - shows 5 rows
select * from students_rejected; - no rows
然后我在 /home/dbadmin/vertica_test directory
csv
文件
vi student_bad.csv
bad_data_type_for_student_id,UnaddedStudent, UnaddedSubject
6,Cassey,Physical Education
我从错误的 csv
文件中添加了数据
copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;
然后我测试了输出
select * from testschema.student - shows 6 rows <-- only one row got added. all ok
select * from students_rejected; - shows 1 row <-- bad row's entry is here. all ok
看起来不错
然后我在没有拒绝数据选项的情况下再次添加了错误数据
copy testschema.students from '/home/dbadmin/vertica_test/student_bad.csv' delimiter ',' ;
但是现在又添加了学号为6的条目!!
student_id | name | major
------------+--------+--------------------
1 | Jack | Biology
2 | Kate | Sociology
3 | Claire | English
4 | Jack | Biology
5 | Mike | Comp. Sci
6 | Cassey | Physical Education <--
6 | Cassey | Physical Education <--
这个应该不会被拒吧?
如果您使用这种类型的命令创建了 students
:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id)
);
即没有显式关键字ENABLED
,则主键约束被禁用。也就是说,您可以愉快地插入重复项,但如果您稍后想通过主键列加入 students
table,则会 运行 出错。
启用主键约束...
[...]
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
[...]
我想你得到了想要的效果。
整个场景:
DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name varchar(20)
, major varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
);
INSERT INTO students
SELECT 1,'Jack' ,'Biology'
UNION ALL SELECT 2,'Kate' ,'Sociology'
UNION ALL SELECT 3,'Claire','English'
UNION ALL SELECT 4,'Jack' ,'Biology'
UNION ALL SELECT 5,'Mike' ,'Comp. Sci'
UNION ALL SELECT 6,'Cassey','Physical Education'
;
-- out OUTPUT
-- out --------
-- out 6
COMMIT;
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out vsql:/home/gessnerm/._vfv.sql:4: ERROR 6745:
-- out Duplicate key values: 'student_id=6'
-- out -- violates constraint 'dbadmin.students.pk_students'
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
唯一可靠的检查似乎是 ANALYZE_CONSTRAINTS()
调用 ...
ALTER TABLE students ALTER CONSTRAINT pk_students DISABLED;
-- out Time: First fetch (0 rows): 7.618 ms. All rows formatted: 7.632 ms
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
\.
-- out Time: First fetch (0 rows): 31.790 ms. All rows formatted: 31.791 ms
SELECT * FROM students;
-- out student_id | name | major
-- out ------------+--------+--------------------
-- out 1 | Jack | Biology
-- out 2 | Kate | Sociology
-- out 3 | Claire | English
-- out 4 | Jack | Biology
-- out 5 | Mike | Comp. Sci
-- out 6 | Cassey | Physical Education
-- out 6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
SELECT ANALYZE_CONSTRAINTS('students');
-- out Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-- out -------------+------------+--------------+-----------------+-----------------+---------------
-- out dbadmin | students | student_id | pk_students | PRIMARY | ('6')
-- out (1 row)