mongodb 架构优化

Question

早期查询的扩展

一个季节不同的摊贩卖不同的水果，他们会按照货架号排列。以下是我在 mongodb 中插入的一些记录 - vendor.json

{
  "_id" : "vendor1",
  "shelf_1": ["Pear","Banana"],
  "shelf_2" : ["Grapes", "MuskMelon", "Apricot"],
  "shelf_3" : ["Pineapple, "Kiwi fruit"],
  "shelf_4" : ["Orange"],
  "shelf_5" : ["Guava","Lemon"]
}

{
  "_id" : "vendor2",
  "shelf_1": ["Mango","Banana"],
  "shelf_2" : ["Grapes", "MuskMelon", "Peach"],
  "shelf_3" : ["Pear, "Pulm"],
  "shelf_4" : ["Jackfruit"],
  "shelf_5" : ["Apple","Apricot"],
  "shelf_6": ["Avocado","Cherry"],
  "shelf_7" : ["Clementine", "Date", "Fig"],
  "shelf_8" : ["Guava, "Honeydew melon"],
  "shelf_9" : ["Lemon"],
  "shelf_10" : ["Kiwi fruit","Elderberry"],
  "shelf_11": ["Mysore Raspberry","Mountain Apple"],
  "shelf_12" : ["Starfruit", "Scrub Cherry", "Pomegranate"],
  "shelf_13" : ["Sugar Apple, "Tropical Appricot"],
  "shelf_14" : ["chinese chestnut",passion fruit],
  "shelf_15" : ["Raspberry","Wax Apple"],
  "shelf_16": ["Blueberries"],
  "shelf_17" : ["Strawberry", "Ugli fruit", "Watermelon"],
  "shelf_18" : ["Quince, "Satsuma","quince"],
  "shelf_19" : ["Pineapple"],
  "shelf_20" : ["Peanut","Orange","blackcurrant","lime","nectarine"]
}
{
  "_id" : "vendor3",
  "shelf_1": ["Mango","Banana"],
  "shelf_2" : ["Jackfruit"],
  "shelf_3" : ["Lemon, "Pulm","Pineapple"],
  "shelf_4" : ["Orange","Guava"],
  "shelf_5" : ["Apple","Apricot"],
  "shelf_6": ["Avocado","Cherry"],
  "shelf_7" : ["Pomegranate", "Date", "Fig"],
  "shelf_8" : ["Watermelon"],
  "shelf_9" : ["Kiwi fruit","Strawberry"]
}

我在货架和每个水果上都添加了索引。这里每个架子都包含独特的水果和这些架子上的水果排列不同的供应商是不同的。

我想使用上面的模式

在供应商已知的情况下从供应商可用的货架上找到水果
查找特定供应商使用的货架总数。因此，关于我为运行以上两个查询

Answer 1

虽然过度规范化会带来严重问题，但您的模式规范化不足。

它的扩展性不好。截至撰写本文时，有一个 16MB size limit on BSON documents。如果您有一个非常大的供应商，您（理论上）可能运行会遇到问题。想象一下沃尔玛在不同地点有数千个货架。请记住，Facebook 必须支付巨额资金，因为他们大大低估了扩展的必要性。
在您当前的架构中，如果要为所有货架编制索引，则必须具有任意数量的索引。撇开其他问题不谈：建立索引不是免费的，即使是在后台完成也是如此。
每个查询只使用一个索引。所以我们需要减少索引的数量。
你问的问题甚至不需要这个架构。这两次，供应商都是已知的。因此，您可以轻松地使用更传统的方法进行简单高效的查询。

这是我的做法。我会有一个供应商模式，其中包含名称和位置等内容。接下来，我将有一个货架架构。每个货架都会有一个供应商的参考，如 SQL。唯一的问题是那些引用是 "weak"，可以这么说。但是由于供应商是已知的，所以他 _id 查询货架架构也是如此。

供应商架构

这很简单

{
  '_id': new ObjectId(),
  'name': 'Acme Mart',
  'location': {
    type: 'Point',
    coordinates: [ 34.180278, -118.328333 ]
  }
}

货架架构

其实也很简单

{
  _id: new ObjectId(),
  vendor: "idOfVendor",
  description: "Shelf 14"
  contents: ["Apples", "Oranges", "Kiwi" ]
}

指数

撇开供应商 location 字段需要的地理空间索引，这里是您需要的索引

// Only if you want to search by name
db.vendors.ensureIndex({name:1})

// we use a compound index here
db.shelves.ensureIndex({vendor:1,contents:1})

您甚至可以在 contents 上使用文本索引，使搜索能够通过 "apples" 找到 "apples" 和 "apple"，但这取决于您决定。

您的查询

由于供应商是已知的，因此他的 _id，我们可以轻松找到包含 Kiwi 的所有货架：

db.shelves.find({vendor:"idOfVendor", contents: "Kiwi"})

数货架数就更简单了：

db.shelves.find({vendor:"idOfVendor"}).count()

mongodb 架构优化

mongodb schema optimization

django

json

mongodb

mongodb-query

供应商架构

货架架构

指数

您的查询