如果列数太多,是否应该将 table 除以 OneToOneField?

Should I divide a table by OneToOneField if the number of columns is too many?

我有一个学生模型,已经有太多字段,包括学生的姓名、国籍、地址、语言、旅行历史等。如下:

class Student(Model):
    user = OneToOneField(CustomUser, on_delete=CASCADE)
    #  Too many other fields

一个学生有更多我存储在其他 table 中的信息,这些信息与学生模型具有 OneToOne 关系,例如:

class StudentIelts(Model):

    student = OneToOneField(Student, on_delete=CASCADE)
    has_ielts = BooleanField(default=False,)
    # 8 other fields for IELTS including the scores and the date
    # and file field for uploading the IELTS result

# I have other models for Toefl, GMAT, GRE, etc that 
# are related to the student model in the same manner through 
# a OneToOne relationship such as:

class StudentIBT(Model):

    student = OneToOneField(Student, on_delete=CASCADE)
    has_ibt = BooleanField(default=False,)
    # other fields

我应该将 table 合并为一个 table 还是当前的数据库架构好?

我选择这个模式的原因是因为我不喜欢 table 使用具有太多列的 table。关键是对于每个学生,雅思和其他模型都应该有一个 table,因此,学生 table 中的行数与雅思中的行数相同table,举个例子。

这是一个很难回答的问题,有很多不同的意见,但我认为你将你们的关系分成两个独立的模型是正确的。

但是,有几个注意事项需要考虑。

从数据库设计的角度来看,几乎没有任何理由拆分您的数据库 tables。凡是一直存在的一对一关系,就应该合并成一个table。除非您正在优化数据库,否则列的数量几乎不重要。

An answer from this question sums up the actual physical reasons to split up a 1-to-1 relationship quite nicely:

  • You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
  • If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
  • You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
  • You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
  • You need different security on different columns, but your DBMS does not support column-level permissions.
  • Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).

Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).

Django 的立场

如果您查看 Django 的设计方式,将 table 拆分为一对一的关系有一些优点。 One of Django's design philosophies is 'Loose coupling'。在 Django 的生态系统中,这意味着独立的应用程序不应该相互了解才能正常运行。在你的情况下,可以说学生模型不应该知道任何关于它的雅思考试,因为如果你将这两者分开,学生模型可以在其他一些应用程序中重复使用。 此外,一些对雅思考试进行某种分析的功能,不应该 'know' 关于参加该考试的学生的任何信息。

尽管如此,请谨慎使用此设计模式。问自己的一个好问题不一定是 "How may columns do I have in my model?",因为有时一个模型中有大量数据是有充分理由的。因此,仅对这个问题回答“是”并不一定值得拆分您的 table。问自己一个更好的问题是 "Do I want to separate responsibilities/functionality of these two types of data?",这可能出于任何原因,例如可重用性或安全性。