按 DocumentDB 中的字段分组

Question

是否有可能以某种方式对 DocumentDB 中的字段进行分组，是否存储过程？

假设我有以下 collection:

[
    {
        name: "Item A",
        priority: 1
    },
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    },
    {
        name: "Item D",
        priority: 1
    }
]

我想要获得最高优先级组中的所有项目（在本例中为优先级 2）。不知道什么值的优先级最高。即：

[
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    }
]

使用一些粗糙的 LINQ，它看起来像这样：

var highestPriority = 
    collection
        .GroupBy(x => x.Priority)
        .OrderByDescending(x => x.Key)
        .First();

Answer 1

DocumentDB 当前不支持 GROUP BY 或任何其他聚合。它是第二个最需要的功能，在 DocumentDB UserVoice.

上列为 "Under Review"

同时，documentdb-lumenize是一个存储过程编写的DocumentDB聚合库。您将 cube.string 作为存储过程加载，然后使用聚合配置调用它。对于这个例子来说有点矫枉过正，但它完全有能力做你在这里问的事情。如果将其传递到存储过程中：

{cubeConfig: {groupBy: "name", field: "priority", f: "max"}}

那应该做你想做的。

请注意，Lumenize 可以做的远不止于此，包括使用其他函数（求和、计数、最小值、最大值、中值、p75 等）的简单分组依据、数据透视表，一直到复杂的每个单元具有多个指标的 n 维超立方体。

我从未尝试从 .NET 加载 cube.string，因为我们使用的是 node.js，但它是作为字符串而不是 javascript 提供的，因此您可以轻松加载和发送它。

或者，您可以编写一个存储过程来完成这个简单的聚合。

Answer 2

DocumentDB 仍然不支持 GroupBy，上面描述的最佳方法（使用存储过程）或如 UserVoice item 中描述的那样使用 Spark 连接器。不过，如果你要分组的set比较小，也有另一种解决方法：

从集合中获取所有未分组的结果并在内存中进行分组。

因此代替：

var highestPriority = 
collection
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

您使用：

var highestPriority = 
collection
    .Where(<filter to reduce set>)
    .AsEnumerable()
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

.AsEnumerable() 从 documentDB 获取结果，然后在内存中完成 groupBy。但请注意，这不是最佳解决方案，只能在您确定结果集很小的情况下使用。

按 DocumentDB 中的字段分组

Grouping by a field in DocumentDB

c#

linq

azure

azure-cosmosdb