如果我的外键已经是唯一约束的一部分,我应该索引它们吗?
Should I index my foreign keys if they are already part of a unique constraint?
在 Postgresql 中,我有一个 table 对两个外键具有唯一约束,如下所示:
CREATE TABLE project_ownerships(
id BIGSERIAL PRIMARY KEY,
project_id BIGINT REFERENCES projects ON DELETE CASCADE,
user_id BIGINT REFERENCES users ON DELETE CASCADE,
role SMALLINT,
CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);
在 project_id
和 user_id
这两个外键上设置了唯一约束后,psql 是否也会自动为它们中的每一个创建索引?还是我仍然应该为它们手动创建索引?如下所示:
CREATE TABLE project_ownerships(
id BIGSERIAL PRIMARY KEY,
project_id BIGINT REFERENCES projects ON DELETE CASCADE,
user_id BIGINT REFERENCES users ON DELETE CASCADE,
role SMALLINT,
CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);
CREATE INDEX po_project_id_idx ON project_ownerships (project_id);
CREATE INDEX po_user_id_idx ON project_ownerships (user_id);
我已阅读 the text here,只是想确保我在实际实施细节方面理解正确。具体来说,当我在 project_id
上执行 join
、user_id
或两者?我还需要为每个外键单独创建索引吗?
unique
对 (project_id, user_id)
的约束创建了一个索引,相当于:
create unique index unq_project_ownerships_project_user
on project_ownerships(project_id, user_id);
使用此索引,project_ownerships(project_id)
上的索引将是多余的。哪里可以使用这个索引,哪里就可以使用唯一索引。
但是 project_ownerships(user_id)
上的索引可能仍然有用,具体取决于您的查询。它可以用于不使用唯一索引的情况。
这是个人喜好问题,但我个人认为您不需要此处的代理键 (id)。 (它曾经使用过吗?)另外: role
是一个(非保留)关键字。避免将其用作标识符。
对于外键,索引是绝对必要的,否则级联删除或更新将(在内部)导致对每个 deleted/updated 用户或项目元组进行顺序扫描。
对于像这样的junction
(桥梁)table,创建一个索引(或UNIQUE 约束)与关键元素的顺序相反就足够了。这也作为 FK 的支撑指标。 [可以使用复合索引的第一个元素,就好像只存在这些字段的索引一样]
索引中的额外键字段可以启用仅索引扫描(例如:不需要the_role
字段时)
CREATE TABLE project_ownerships
( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
, user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
, the_role INTEGER
, PRIMARY KEY (project_id, user_id)
, CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
);
一个小的测试设置
(我需要禁用 sort 和 hashjoin,因为对于像这样的小 tables,这些实际上会导致更便宜的计划;-)
SET search_path=tmp;
SELECT version();
CREATE TABLE projects
( id bigserial not NULL PRIMARY KEY
, the_name text UNIQUE
);
CREATE TABLE users
( id bigserial not NULL PRIMARY KEY
, the_name text UNIQUE
);
CREATE TABLE project_ownerships
( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
, user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
, the_role INTEGER
, PRIMARY KEY (project_id, user_id)
, CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
);
INSERT INTO projects( the_name)
SELECT 'project-' || gs::text
FROM generate_series(1,1000) gs
;
INSERT INTO users( the_name)
SELECT 'name_' || gs::text
FROM generate_series(1,1000) gs
;
INSERT INTO project_ownerships (project_id,user_id,the_role)
SELECT p.id, u.id , (random()* 100)::integer
FROM projects p
JOIN users u ON random() < .10
;
VACUUM ANALYZE projects,users,project_ownerships;
SET enable_hashjoin = 0;
SET enable_sort = 0;
-- SET enable_seqscan = 0;
EXPLAIN ANALYZE
SELECT p.the_name AS project_name
, po.the_role AS the_role
FROM projects p
JOIN project_ownerships po ON po.project_id = p.id
AND EXISTS (
SELECT *
FROM users u
WHERE u.id = po.user_id
AND u.the_name >= 'name_10'
AND u.the_name < 'name_20'
);
EXPLAIN ANALYZE
SELECT u.the_name AS user_name
, po.the_role AS the_role
FROM users u
JOIN project_ownerships po ON po.user_id = u.id
AND EXISTS (
SELECT *
FROM projects p
WHERE p.id = po.project_id
AND p.the_name >= 'project-10'
AND p.the_name < 'project-20'
);
生成的查询计划:
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
SET
SET
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.97..4693.68 rows=11924 width=15) (actual time=0.333..153.660 rows=11157 loops=1)
-> Nested Loop (cost=0.69..1204.55 rows=11924 width=12) (actual time=0.268..53.192 rows=11157 loops=1)
-> Index Scan using users_the_name_key on users u (cost=0.28..7.02 rows=119 width=8) (actual time=0.126..0.317 rows=112 loops=1)
Index Cond: ((the_name >= 'name_10'::text) AND (the_name < 'name_20'::text))
-> Index Scan using reversed_pk on project_ownerships po (cost=0.42..9.06 rows=100 width=20) (actual time=0.015..0.308 rows=100 loops=112)
Index Cond: (user_id = u.id)
-> Index Scan using projects_pkey on projects p (cost=0.28..0.29 rows=1 width=19) (actual time=0.005..0.005 rows=1 loops=11157)
Index Cond: (id = po.project_id)
Planning Time: 6.218 ms
Execution Time: 162.319 ms
(10 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.97..4057.79 rows=11022 width=12) (actual time=0.084..93.584 rows=11236 loops=1)
-> Nested Loop (cost=0.69..832.59 rows=11022 width=12) (actual time=0.063..25.260 rows=11236 loops=1)
-> Index Scan using projects_the_name_key on projects p (cost=0.28..6.84 rows=110 width=8) (actual time=0.037..0.163 rows=112 loops=1)
Index Cond: ((the_name >= 'project-10'::text) AND (the_name < 'project-20'::text))
-> Index Scan using project_ownerships_pkey on project_ownerships po (cost=0.42..6.51 rows=100 width=20) (actual time=0.010..0.111 rows=100 loops=112)
Index Cond: (project_id = p.id)
-> Index Scan using users_pkey on users u (cost=0.28..0.29 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=11236)
Index Cond: (id = po.user_id)
Planning Time: 0.971 ms
Execution Time: 99.671 ms
(10 rows)
在 Postgresql 中,我有一个 table 对两个外键具有唯一约束,如下所示:
CREATE TABLE project_ownerships(
id BIGSERIAL PRIMARY KEY,
project_id BIGINT REFERENCES projects ON DELETE CASCADE,
user_id BIGINT REFERENCES users ON DELETE CASCADE,
role SMALLINT,
CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);
在 project_id
和 user_id
这两个外键上设置了唯一约束后,psql 是否也会自动为它们中的每一个创建索引?还是我仍然应该为它们手动创建索引?如下所示:
CREATE TABLE project_ownerships(
id BIGSERIAL PRIMARY KEY,
project_id BIGINT REFERENCES projects ON DELETE CASCADE,
user_id BIGINT REFERENCES users ON DELETE CASCADE,
role SMALLINT,
CONSTRAINT project_user_unique UNIQUE (project_id, user_id)
);
CREATE INDEX po_project_id_idx ON project_ownerships (project_id);
CREATE INDEX po_user_id_idx ON project_ownerships (user_id);
我已阅读 the text here,只是想确保我在实际实施细节方面理解正确。具体来说,当我在 project_id
上执行 join
、user_id
或两者?我还需要为每个外键单独创建索引吗?
unique
对 (project_id, user_id)
的约束创建了一个索引,相当于:
create unique index unq_project_ownerships_project_user
on project_ownerships(project_id, user_id);
使用此索引,project_ownerships(project_id)
上的索引将是多余的。哪里可以使用这个索引,哪里就可以使用唯一索引。
但是 project_ownerships(user_id)
上的索引可能仍然有用,具体取决于您的查询。它可以用于不使用唯一索引的情况。
这是个人喜好问题,但我个人认为您不需要此处的代理键 (id)。 (它曾经使用过吗?)另外: role
是一个(非保留)关键字。避免将其用作标识符。
对于外键,索引是绝对必要的,否则级联删除或更新将(在内部)导致对每个 deleted/updated 用户或项目元组进行顺序扫描。
对于像这样的junction
(桥梁)table,创建一个索引(或UNIQUE 约束)与关键元素的顺序相反就足够了。这也作为 FK 的支撑指标。 [可以使用复合索引的第一个元素,就好像只存在这些字段的索引一样]
索引中的额外键字段可以启用仅索引扫描(例如:不需要the_role
字段时)
CREATE TABLE project_ownerships
( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
, user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
, the_role INTEGER
, PRIMARY KEY (project_id, user_id)
, CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
);
一个小的测试设置 (我需要禁用 sort 和 hashjoin,因为对于像这样的小 tables,这些实际上会导致更便宜的计划;-)
SET search_path=tmp;
SELECT version();
CREATE TABLE projects
( id bigserial not NULL PRIMARY KEY
, the_name text UNIQUE
);
CREATE TABLE users
( id bigserial not NULL PRIMARY KEY
, the_name text UNIQUE
);
CREATE TABLE project_ownerships
( project_id BIGINT REFERENCES projects (id) ON DELETE CASCADE
, user_id BIGINT REFERENCES users(id) ON DELETE CASCADE
, the_role INTEGER
, PRIMARY KEY (project_id, user_id)
, CONSTRAINT reversed_pk UNIQUE (user_id, project_id)
);
INSERT INTO projects( the_name)
SELECT 'project-' || gs::text
FROM generate_series(1,1000) gs
;
INSERT INTO users( the_name)
SELECT 'name_' || gs::text
FROM generate_series(1,1000) gs
;
INSERT INTO project_ownerships (project_id,user_id,the_role)
SELECT p.id, u.id , (random()* 100)::integer
FROM projects p
JOIN users u ON random() < .10
;
VACUUM ANALYZE projects,users,project_ownerships;
SET enable_hashjoin = 0;
SET enable_sort = 0;
-- SET enable_seqscan = 0;
EXPLAIN ANALYZE
SELECT p.the_name AS project_name
, po.the_role AS the_role
FROM projects p
JOIN project_ownerships po ON po.project_id = p.id
AND EXISTS (
SELECT *
FROM users u
WHERE u.id = po.user_id
AND u.the_name >= 'name_10'
AND u.the_name < 'name_20'
);
EXPLAIN ANALYZE
SELECT u.the_name AS user_name
, po.the_role AS the_role
FROM users u
JOIN project_ownerships po ON po.user_id = u.id
AND EXISTS (
SELECT *
FROM projects p
WHERE p.id = po.project_id
AND p.the_name >= 'project-10'
AND p.the_name < 'project-20'
);
生成的查询计划:
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
SET
SET
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.97..4693.68 rows=11924 width=15) (actual time=0.333..153.660 rows=11157 loops=1)
-> Nested Loop (cost=0.69..1204.55 rows=11924 width=12) (actual time=0.268..53.192 rows=11157 loops=1)
-> Index Scan using users_the_name_key on users u (cost=0.28..7.02 rows=119 width=8) (actual time=0.126..0.317 rows=112 loops=1)
Index Cond: ((the_name >= 'name_10'::text) AND (the_name < 'name_20'::text))
-> Index Scan using reversed_pk on project_ownerships po (cost=0.42..9.06 rows=100 width=20) (actual time=0.015..0.308 rows=100 loops=112)
Index Cond: (user_id = u.id)
-> Index Scan using projects_pkey on projects p (cost=0.28..0.29 rows=1 width=19) (actual time=0.005..0.005 rows=1 loops=11157)
Index Cond: (id = po.project_id)
Planning Time: 6.218 ms
Execution Time: 162.319 ms
(10 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.97..4057.79 rows=11022 width=12) (actual time=0.084..93.584 rows=11236 loops=1)
-> Nested Loop (cost=0.69..832.59 rows=11022 width=12) (actual time=0.063..25.260 rows=11236 loops=1)
-> Index Scan using projects_the_name_key on projects p (cost=0.28..6.84 rows=110 width=8) (actual time=0.037..0.163 rows=112 loops=1)
Index Cond: ((the_name >= 'project-10'::text) AND (the_name < 'project-20'::text))
-> Index Scan using project_ownerships_pkey on project_ownerships po (cost=0.42..6.51 rows=100 width=20) (actual time=0.010..0.111 rows=100 loops=112)
Index Cond: (project_id = p.id)
-> Index Scan using users_pkey on users u (cost=0.28..0.29 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=11236)
Index Cond: (id = po.user_id)
Planning Time: 0.971 ms
Execution Time: 99.671 ms
(10 rows)