构建关系数据库 - 组合相似和相关的表

Structuring Relational databases - combining similar and related tables

我习惯于看到关系数据库,其中不同的实体存储在不同的 table 中。 (简单示例:国家、州、城市)。最近我看到更多不同但相似的实体被捆绑到相同 table 中并结合不同视图的情况。我认为这可以节省 tables 和数据访问程序(可能以牺牲清晰度和灵活性为代价)。重新阅读规范化数据库的定义,我不认为这违反了任何规则,但它似乎不太直观,并且回到旧的大型机 "Miscellaneous" tables,你把在设计阶段遗忘的任何东西都放在那里.请参阅下面的 2 个示例:Multi-table 解决方案与单一 table 解决方案。这种现象是数据或编程设计模式的一部分并且有名字吗?

如果你有小的专用 tables,那么数据库可以很容易地在内存中缓存它需要的。

如果你把本来很小的 table 塞进一个,数据库就不知道哪些条目对缓存很重要,哪些不重要。

更重要的是,出现错误的机会更多,因为您可能会无意中输入错误的类型代码并最终加入不相关的内容,没有 RI 或类型检查来警告您。如果您使用小型专用 table,那么您可以指定 RI 约束。

回想我看到单个怪物查找-table 模式完成的地方,我认为吸引人的是开发人员可以添加更多种类的条目,而无需 DBA 干预来创建更多 table秒。有很多开发人员,只有少数 DBA,这就是 DBA 避免陷入每次引入新类型查找条目时必须创建专用查找 table 的方式。 (显然,授予 dev 中的创建 table 权限对于那里的 DBA 来说是不可接受的table。)

对于难以更改数据库架构的环境,这似乎是一种变通方法。但另一个考虑因素是,如果您的所有条目都在一个 table.

中,则可能更容易国际化

这个模式有一个既定的名字,叫做 One True Lookup Table。链接的文章将其称为反模式,并列出了该技术的更多缺陷。以下是文章中的项目符号列表:

  • It makes the SQL look ugly.

  • Many statements will require multiple joins to the lookup table. The extra join columns make the statements look bigger and scarier. There will be the same number of joins when using separate lookup tables, but those joins will be simpler.

  • Multiple references to the same table can make it hard to determine what is happening in the execution plan, as you will see those repeated references there, and have to refer to the predicates to understand the context of table reference. If you were using separate lookup tables, it would be clear which table you were referring to at any point of the execution plan.

  • You can't foreign key to this type of table. Technically you can if you are willing to put both columns (lookup_type_code and lookup_key) in the table, but you won't because it is ugly. This means there is a good chance your data integrity will be compromised over time. It's really easy to foreign key to individual lookup tables, and therefore protect your data.

  • It's hard to control the contents of the table. It's a shared resource, so check constraints and triggers are problematic. If you need users to have different privileges, depending on which lookup they are dealing with, things are going to get messy. That would be really easy with separate lookup tables.

  • If you need to make a change for one reference type, like extending the size of the key or value, it affects all reference data. Using separate lookup tables isolates the change.

  • Over time, many reference tables take on additional data. To model that you would need to either split out that reference data from this shared lookup table, or start adding optional columns to cope with the "one-off" issues. A change like this is really simple for separate lookup tables.

  • Data types matter. You should always use the correct data type, as it will reduce the number of data type conversions needed. Implicit data type conversions are bugs waiting to happen!

  • Performance can be a problem with the OTLT approach as it's hard for the optimizer to make sound judgements about the data. The optimizer cares about cardinality, but it may be hard to make that decision if you are dealing with a large number of rows, most of which are irrelevant in any one specific context. The optimizer also cares about high/low values, but these are not be relevant to any one lookup, but shared. We've also mentioned you probably won't foreign key to this data, which will reduce the amount of information the optimizer has when making its decision. You may have artificially made columns optional, that are actually mandatory, a key must have a value, but which column? I think you get the message.

我认为,如果您只需要名称字典(用于拼写检查或类似的东西),第二种方法就足够了。否则,如果对象有一些额外的特定字段,第二种方法就很糟糕了。