对于可变数量的组合，最好的 table 结构是什么？

Question

我需要一些建议来选择我的 table 结构。

我正在做一个项目，我需要保存由可变数量的其他值组合而成的值。

例如：

A = b,c,d
B = z,r

我正在考虑将组合保存在列内的 json 对象中，但我担心对于大请求来说它可能会很长并且不容易过滤。

也有多个列的解决方案（不需要时包含 null），但这不能很好地表示数据，而且过滤也很困难。

最后我认为最好的是多对多关系，但是连接可能太重了，是吗？

你看到任何其他选择（除了切换到 nosql 之外）吗？

Answer 1

这显示了使用 Junction tables 来避免将数据保存在逗号分隔的列表中，json 或至少在这些区域会出现问题的其他机制：

Tables-scans（速度慢，快速索引的 non-use）
维护数据
数据完整性

架构

create table cat
(   -- categories
    id int auto_increment primary key,
    code varchar(20) not null,
    description varchar(255) not null
);

create table subcat
(   -- sub categories
    id int auto_increment primary key,
    code varchar(20) not null,
    description varchar(255) not null
);

create table csJunction
(   -- JUNCTION table for cat / sub categories
    -- Note: you could ditch the id below, and go with composite PK on (catId,subCatId)
    -- but this makes the PK (primary key) thinner when used elsewhere
    id int auto_increment primary key,
    catId int not null,
    subCatId int not null,
    CONSTRAINT fk_csj_cat FOREIGN KEY (catId) REFERENCES cat(id),
    CONSTRAINT fk_csj_subcat FOREIGN KEY (subCatId) REFERENCES subcat(id),
    unique key (catId,subCatId) -- prevents duplicates
);


insert cat(code,description) values('A','descr for A'),('B','descr for B'); -- id's 1,2 respectively

insert subcat(code,description) values('b','descr for b'),('c','descr for c'),('d','descr for d');  -- id's 1,2,3
insert subcat(code,description) values('r','descr for r'),('z','descr for z'); -- id's 4,5

-- Note that due to the thinness of PK's, chosen for performance, the below is by ID
insert csJunction(catId,subCatId) values(1,1),(1,2),(1,3); -- A gets a,b,c
insert csJunction(catId,subCatId) values(2,4),(2,5);    -- B gets r,z

好的错误

以下错误正常且符合预期，数据保持干净

insert csJunction(catId,subCatId) values(2,4); -- duplicates not allowed (Error: 1062)
insert csJunction(catId,subCatId) values(13,4); -- junk data violates FK constraint (Error: 1452)

其他评论

针对您的评论，仅在 mysql 具有最近使用 (MRU) 策略的情况下才缓存数据，不多于或少于内存中缓存的任何数据与物理查找。

事实上 B 目前可能不仅包含 z,r，而且可能还包含 c 和 A，并不代表有重复。正如在模式中看到的那样，没有 parent 可以复制它对 child 的包含（或重复），无论如何这将是一个数据问题。

请注意，使用 code 列可以很容易地完成 cat 和 subcat 中的 PK 路线。不幸的是，这会导致索引变宽，甚至会导致结点 table 的复合索引变宽。这将大大减慢操作速度。尽管数据维护在视觉上可能更具吸引力，但我每天都倾向于 performance 而不是 appearance。

当时间允许时，我将添加到此答案中以显示 "What categories contain a certain subcategory"、删除等内容

对于可变数量的组合，最好的 table 结构是什么？

What would be the best table structure for variable amount of combination?

mysql

database

database-design

atomic

架构

好的错误

其他评论