在 mongodb 中构建 hierarchical/classified 数据的最佳实践

Question

总结：

我正在构建我的第一个大型全堆栈应用程序（MERN 堆栈），它试图模仿一家大型服装店。每件衣服都有很多'tags'代表它的特征，top/bottom/accessory/shoes/ect，还有子类，比如上面有shirt/outerwear/sweatshirt/etc，里面有子类，比如衬衫有 blouse/t-shirt/etc。每篇文章都有原色、底边、口袋、技术特征等标签。

主要问题：

当我计划拥有 50,000 篇或更多文章时，我应该如何最好地使用 mongoose 模式组织 mongodb 中的数据，以便快速搜索它？真正好奇的是，当商品具有如此多的识别特征时，大型服装零售商通常如何设计数据库以便客户轻松搜索？

我尝试过或想到的事情：

在 mongoDB 网站上，建议使用带有子引用的树结构。这里是 link: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-child-references/ I like this idea but I read here: https://developer.mongodb.com/article/mongodb-schema-design-best-practices/ 当存储超过几千条数据时，使用对象 ID 引用不再足够，并且可能会因为数据限制而产生问题。

此外，每件衣服都会落入树的许多不同部分。例如，它可能是一件衬衫，所以它会在树的衬衫“叶子”中，如果它是蓝色的，它就会在树的蓝色“叶子”中，如果它是可持续采购的，它就会掉落也进入树的那片“叶子”。考虑到这一点，树状数据结构似乎不是正确的方法。它将在许多不同的叶子中存储相同的 ObjectID。

我的另一个想法是将文章信息（描述、价格和图片）与 tagging/hierarchical 信息分开存储。然后每个标记对象都会有一个对该项目的 ObjectID 引用。这样我就可以利用 mongoose 的 propogate 方法来收集这些信息。

我还创建了大树结构的一部分作为我的设计理念的概念证明，这目前仅适用于前端，但这也会造成糟糕的搜索，因为它们看起来像分类法[ 0].options[0].options[0].options[0].title 到达 'blouse'。从我的类来看，这似乎不是使代码可读的好方法。这只是一个很长很长的分支对象的片段。我打算尝试使它成为猫鼬模式。但它有很多工作，我想确保我做得很好。

 const taxonomy = [
    {
        title: 'Category',
        selected: false,
        options: [
            {
                title: 'top',
                selected: false,
                options: [
                    {
                        title: 'Shirt',
                        selected: false,
                        options: [
                            {
                                title: 'Blouse',
                                selected: false,
                            },
                            {
                                title: 'polo',
                                selected: false,
                            },
                            {
                                title: 'button down',
                                selected: false,
                            },
                        ],
                    },
                    {
                        title: 'T-Shirt',
                        selected: false,
                    },
                    {
                        title: 'Sweater',
                        selected: false,
                    },
                    {
                        title: 'Sweatshirt and hoodie',
                        selected: false,
                    },
                ],
            },

前进：

我不是在寻找完美的答案，但我确信之前有人解决过这个问题（所有销售大量分类产品的大企业都有）如果有人能给我指出正确的方向，例如, 给我一些术语 google, 一些文章来阅读, 或者一些视频来观看, 那就太好了。

感谢您提供的任何指导。

Answer 1

MongoDB 是一个基于文档的数据库。 collection中的每条记录都是一个文档，每个文档都应该是self-contained（它应该包含你需要的所有信息）。

最佳做法是为您能想到的每个逻辑整体创建一个 collection。当您的文档包含大量数据时，这是最佳做法，因为它是可扩展的。

例如，您应该为 Products、Subproducts、Categories、Items、Providers、[= 创建 Collections 20=]...

现在，当您创建架构时，无需创建嵌套结构，您只需将一个 collection 文档的引用存储为另一个 collection 文档的属性。

注意：最大文档大小为 16 兆字节。

错误做法

让我们先看看什么是不好的做法。考虑这个结构：

Product = {
  "name": "Product_name",
  "sub_products": [{
      "sub_product_name": "Subpoduct_name_1",
      "sub_product_description": "Description",
      "items": [{
          "item_name": "item_name_1",
          "item_desciption": "Description",
          "discounts": [{
            "discount_name": "Discount_1",
            "percentage": 25
          }]
        },
        {
          "item_name": "item_name_2",
          "item_desciption": "Description",
          "discounts": [{
            "discount_name": "Discount_1",
            "percentage": 25
          },
          {
            "discount_name": "Discount_2",
            "percentage": 50
          }]
        },
      ]
    },
    ...
  ]
}

此处 product 文档有 sub_products 属性，这是一个 sub_products 的数组。每个 sub_product 有 items，每个 item 有 discounts。如您所见，由于这种嵌套结构，很快就会超过最大文档大小。

良好实践

考虑这个结构：

Product = {
  "name": "Product_name",
  "sub_products": [
     'sub_product_1_id',
     'sub_product_2_id',
     'sub_product_3_id',
     'sub_product_4_id',
     'sub_product_5_id',
     ...
  ]
}

Subproduct = {
  "id": "sub_product_1_id",
  "sub_product_name": "Subroduct_name",
  "sub_product_description": "Description",
  "items": [
     'item_1_id',
     'item_2_id',
     'item_3_id',
     'item_4_id',
     'item_5_id',
     ...
  ]
}

Item = {
    "id": "item_1_id",
  "item_name": "item_name_1",
  "item_desciption": "Description",
  "items": [
     'discount_1_id',
     'discount_2_id',
     'discount_3_id',
     'discount_4_id',
     'discount_5_id',
     ...
  ]
}

Discount = {
  "id": "discount_1_id",
  "discount_name": "Discount_1",
  "percentage": 25
}

现在，每个逻辑整体都有 collection，您只是将一个 collection 文档的引用存储为另一个 collection 文档的属性。

现在您可以使用 Mongoose 的最佳功能之一，即人口。如果将一个 collection 文档的引用存储为另一个 collection 文档的属性，则在执行数据库查询时，Mongoose 将用实际文档替换引用。

在 mongodb 中构建 hierarchical/classified 数据的最佳实践

Best practices for structuring hierarchical/classified data in mongodb

mongoose

mongodb

node.js

express

mongoose-schema