我们能否在 mongoDb 中将数十万数据存储在一个文档中而不会出现任何性能问题？

Question

我是 mongodb 的新手。需要了解获取一份包含 >5gb 相关数据的文档的性能问题。

我的文档结构：

{
    _id:100,

    question_id:200,

    analyze_data:[
       {
         date:20-01-1920,
         store_id:50,
         user_id:6,
       },
       .....,
       hundreds of thousands of records here 
       .....,
       {
         date:20-01-2015,
         store_id:6000,
         user_id:600000,
       },
      (nth number)
    ],
    graph_data:[
        {
         graph_id:5
         date:20-01-1920,
         store_id:50,
         user_id:6,
       },
       .....,
       hundreds of thousands of records here 
       .....,
       {
         date:20-01-2015,
         store_id:10000,
         user_id:400000,
       },
      (nth number)
    ]

}

我的 collection 中有此类文档，我必须过滤 analyze_data 和 graph_data根据日期，store_id，user_id。

过滤后我需要做一些计算并重构我的数组。

{
    _id:100,

    question_id:200,

    analyze_data:[
       {
         date:20-01-1920,
         res:[
            {
             user_id:2,
             store_id:5,
             ......
            },
            {
             user_id:6,
             store_id:8,
             ......
            },
            (nth num)
         ]
       },
        {
         date:21-01-1999,
         res:[
            {
             user_id:644,
             store_id:66689,
             ......
            },
            {
             user_id:6455,
             store_id:877777,
             ......
            },
            (nth num)
            ]
       },
       ...............,
       ...............,
       ...............,
       (nth num)

    ],
    graph_data:[
        {
         date:20-01-1920,
         res:[
            {
             user_id:2,
             store_id:5,
             graph_details:{
              x_axis: [1,2,3,4,5,8,955,44,55,141],
              y_axis: [545,4545,77,55,88,228,822,5,22] 
             }
             ......
            },
            {
             user_id:6,
             store_id:8,
             graph_details:{
              x_axis: [154,2546,345,4456,5456,8456,955],
              y_axis: [545,4545,77,55,88,228,822,5,22] 
             }
             ......
            },
            (nth num)
         ]
       },
        {
         date:21-01-1999,
         res:[
            {
             user_id:644,
             store_id:66689,
             graph_details:{
              x_axis: [1,2,3,4,5,8,955,44,55,141],
              y_axis: [545,4545,77,55,88,228,822,5,22] 
             }
             ......
            },
            {
             user_id:6455,
             store_id:877777,
             graph_details:{
              x_axis: [1,2,3,4,5,8,955,44,55,141],
              y_axis: [545,4545,77,55,88,228,822,5,22] 
             }
             ......
            },
            (nth num)
            ]
       },
       ...............,
       ...............,
       ...............,
       (nth num)
    ]

}

文件没有限制。

重要如何使用 mongodb-PHP 在一个连接中使用聚合和映射缩减，并在一个实例中使用多个 collection。

分享任何有价值的 recourse/post 我被清除的地方。

这是存储相关数据的正确方法吗？

这是在 mongo 中存储数据的正确方法吗？
会不会出现性能问题？
减少和重组输出的最佳方法是什么？给我？

请提供任何有价值的资源..

谢谢。

Answer 1

一个 MongoDB Document has a size limit of 16 MB. You can use GridFS 超过此限制，但在内部您的文档被拆分为 16 MB 的块放在一起进行查找。所以你的查询应该花费很长的时间。

我认为最好为文档中的每个数组创建一个集合，并将 question_id 和 _id 添加为 id_ref（因为 _id是一个保留键，所有值都必须是唯一的）到您的数组元素，以便可以识别。

Collection: analyze_data
{
  id_ref:100,
  question_id:200,
  date:20-01-1920,
  store_id:50,
  user_id:6,
},
...
{
  id_ref:100,
  question_id:200,
  date:20-01-2015,
  store_id:6000,
  user_id:600000,
},
etc. with other `id_ref`and `question_id`.

graph_data 的模拟集合。

您可以使用 aggregation framework 按 date、store_id、user_id 过滤两个集合，然后将两个集合的结果组合回一个文档匹配 ref_id 或 question_id.

我们能否在 mongoDb 中将数十万数据存储在一个文档中而不会出现任何性能问题？

can we store hundreds of thousands of data in a one document without any performance issue in mongoDb?

mongodb

mongodb-php