Mongodb: 使用聚合获取嵌套数组的样本

Mongodb: Get sample of nested array using Aggreagate

$slice 允许我获取嵌套数组的一部分。我成功地使用它是这样的:

const user = await User.aggregate([
  { $match: { _id: ObjectId(user_id) } },
  {
    $lookup: {
      from: "users",
      let: { friends: "$friends" },
      pipeline: [
        { $match: { $expr: { $in: ["$_id", "$$friends"] } } },
        {
          $lookup: {
            from: "profiles",
            localField: "profile",
            foreignField: "_id",
            as: "profile",
          },
        },
        {
          $match: {
            "profile.online": true,
          },
        },

        {
          $project: {
            name: "$name",
            surname: "$surname",
            profile: { $arrayElemAt: ["$profile", 0] },
          },
        },
      ],
      as: "friends",
    },
  },
  {
    $addFields: {
      friends: {
        $slice: ["$friends", skip, limit],
      },
    },
  },
]);

现在,我不想对切片进行切片,而是对数组字段进行随机抽样 friends

我找不到执行此操作的方法。但是,在小组赛阶段我可以使用 this:

  const pipeline = [
    {
      $lookup: {
        from: "profiles",
        let: { profiles_id: "$profile" },
        pipeline: [
          {
            $match: {
              online: true,
              $expr: { $eq: ["$_id", "$$profiles_id"] },
            },
          },
        ],
        as: "profile",
      },
    },
    { $unwind: "$profile" },
    { $sample: { size: 10 } },
  ];
  const users = await User.aggregate(pipeline);

将最后一个 $addFields 阶段更改为此。

优点:它“有效”。

缺点:不能保证您在列表中 唯一 个随机条目。要做到这一点需要做更多的工作。如果你有比范围更多的朋友那么你可能没问题。

    ,{$addFields: {friends: {$reduce: { // overwrite friends array...
        // $range is the number of things you want to pick:                                            
        input: {$range:[0,4]},
        initialValue: [],
        in: {
            $let: {
                // qq will be a random # between 0 and size-1 thanks to mult                           
                // and floor, so we do not have to do qq-1 to get to zero-based                        
                // indexing on the $friends array                                                      
                vars: {qq: {$floor:{$multiply:[{$rand: {}},{$size:"$friends"}]}} },

                // $concat only works for strings, but $concatArrays can be used                       
                // (creatively) on other types. Here $slice returns an array of                        
                // 1 item which we easily pass to $concatArrays to build the                           
                // the overall result:                                                                 
                in: {$concatArrays: [ "$$value", {$slice:["$friends","$$qq",1]} ]}
            }}
    }}

已更新

此版本利用在 $reduce 链中保持状态,不会选择欺骗。它通过在随机选择每个项目时迭代地缩小项目的输入候选列表来实现。输出有点嵌套(即 friends 未设置为选择随机样本,而是设置为包含选择和剩余 aa 列表的 object)但这是在之后很容易重新格式化的东西事实。在 MongoDB 5.0 中,我们可以通过以下方式完成它:

    {$addFields: {friends: {$getField: {field: "$friends.picks", input: {$reduce: {

但很多人还没有使用 5.0。

    {$addFields: {friends: {$reduce: {
        // $range is the number of things you want to pick:
        input: {$range:[0,6]},

        // This is classic use of $reduce to iterate over something AND
        // preserve state.  We start with picks as empty and aa being the
        // original friends array:
        initialValue: {aa: "$friends", picks: []},

        in: {
            $let: {
                // idx will be a random # between 0 and size-1 thanks to mult
                // and floor, so we do not have to do idx-1 to get to zero-based
                // indexing on the $friends array.  idx and sz will be eval'd
                // each time reduce turns the crank through the input range:
                vars: {idx: {$floor:{$multiply:[{$rand: {}},{$size:"$$value.aa"}]}},
                       // cannot set sz and then use it in same vars; oh well
                       sz: {$size:"$$value.aa"}
                      },

                in: {
                    // Add to our picks list:
                    picks: {$concatArrays: [ "$$value.picks", {$slice:["$$value.aa","$$idx",1]} ]},

                    // And now shrink up the input candidate array.                                    
                    // Sadly, we cannot do $slice:[array,pos,0] to yield an empty
                    // array and keep the $concat logic tight; thus we have to test
                    // for front and end special conditions.
                    // This whole bit is to extract the chosen item from the aa
                    // array by splicing together a new one MINUS the target.
                    // This will change the value of $sz (-1) as we crank thru
                    // the picks.  This ensures we only pick UNPICKED items from
                    // $$value.aa!
                    
                    aa: {$cond: [{$eq:["$$idx",0]}, // if

                         // idx 0: Take from idx 1 and count size - 1:
                         {$slice:["$$value.aa",1,{$subtract:["$$sz",1]}]}, // then

                         // idx last: Take from idx 0 and ALSO count size - 1:
                         {$cond: [ // else
                             {$eq:["$$idx",{$subtract:["$$sz",1]}]}, // if
                             {$slice:["$$value.aa",0,{$subtract:["$$sz",1]}]}, // then

                             // else not 0 or last item, i.e. idx = 3
                             {$concatArrays: [
                                 // Start at 0, count idx; this will land
                                 // us BEFORE the target item (because idx
                                 // is n-1:
                                 {$slice:["$$value.aa",0,"$$idx"]},

                                 // Jump over the target (+1), and go n-2
                                 // (1 for idx/n conversion, and 1 for the
                                 // fact we jumped over:
                                 {$slice:["$$value.aa",{$add:["$$idx",1]},{$subtract:["$$sz",2]}]}
                             ]}
                         ]}
                    ]}
                }
            }}
        }}
    }}

]);

从 MongoDB v4.4(2021 年 1 月)开始,您可以选择使用 $function 运算符。 javascript 中的 splice 函数完成前面示例中多个 $slice 操作的所有工作。

    {$addFields: {friends: {$function: {
        body: function(candidates, npicks) {
            var picks = []
            for(var i = 0; i < npicks; i++) {
                var idx = Math.floor(Math.random() * candidates.length);
                picks.push(candidates.splice(idx,1)[0]);
            }
            return picks;
        },
        args: [ "$friends", 4], // 4 is num to pick                                                    
        lang: "js"
    }}