如何使用 d3 数组中的组和汇总汇总数组?

How to summarize an array with group and rollup from d3-array?

我正在尝试使用 d3-array 生成对象数组的两个摘要:

这是我目前的做法:

const data = [
  { post_id: 47469, action: "reply", teacher_username: "John" },
  { post_id: 47469, action: "edit", teacher_username: "John" },
  { post_id: 47468, action: "reply", teacher_username: "John" },
  { post_id: 47465, action: "reply", teacher_username: "Mary" },
  { post_id: 47465, action: "edit", teacher_username: "Mary" },
  { post_id: 47467, action: "edit", teacher_username: "Mary" },
  { post_id: 46638, action: "reply", teacher_username: "Paul" },
];

const teacherSummary = [
  ...d3.rollup(
    data,
    (x) => x.length,
    (d) => d.teacher_username,
    (d) => d.action
  ),
]
  .map((x) => {
    return {
      teacher_username: x[0],
      num_edits: x[1].get("edit") || 0,
      num_replies: x[1].get("reply") || 0,
    };
  })
  .sort((a, b) => d3.descending(a.num_edits, b.num_edits));
// [
//   { "teacher_username": "Mary", "num_edits": 2, "num_replies": 1 },
//   { "teacher_username": "John", "num_edits": 1, "num_replies": 2 },
//   { "teacher_username": "Paul", "num_edits": 0, "num_replies": 1 }
// ]

const postIdsByTeacher = d3.rollups(
  data.filter((x) => x.action === "edit"),
  (v) => [...new Set(v.map((d) => d.post_id))].join(", "), // Set() is used to get rid of duplicate post_ids
  (d) => d.teacher_username
);
// [
//  ["John","47469"],
//  ["Mary","47465, 47467"]
// ]

我对输出格式很灵活。我想要优化的是效率和清晰度:

编辑:出于好奇,我也使用 alasql 尝试了这种方法。除了 edited_post_ids 中的空值外,它几乎可以工作。

sql = alasql(`
select
  teacher_username,
  count(case when action = 'reply' then 1 end) num_replies,
  count(case when action = 'edit' then 1 end) num_edits,
  array(case when action = 'edit' then post_id end) as edited_post_ids
from ?
group by teacher_username
`, [data])
// [ 
//   { teacher_username: "John", num_replies: 2, num_edits: 1, edited_post_ids: [null, 47469, null], }, 
//   { teacher_username: "Mary", num_replies: 1, num_edits: 2, edited_post_ids: [null, 47465, 47467], }, 
//   { teacher_username: "Paul", num_replies: 1, num_edits: 0, edited_post_ids: [null], },
// ];

d3.rollup 的函数签名是:

d3.rollup(可迭代,减少,...键)

从表面上看,您可以在 reduce 中提供一项操作,例如计数或求和或其他一些操作 - 但只有一个。

对于您的输出,您正在寻找两种不同的操作

  • 统计回复和编辑,
  • 一个数组运算得到post_ids where action == "edit"

一旦您选择使用 (x) => x.length,您就失去了使用其他 reduce 操作的机会。可以说d3.rollup不是你多操作需要的功能吗?

您仍然可以将 edited_post_ids 添加到 teacherSummary,只需返回原始数据并应用 filter 然后 map:

const data = [
  { post_id: 47469, action: "reply", teacher_username: "John" },
  { post_id: 47469, action: "edit", teacher_username: "John" },
  { post_id: 47468, action: "reply", teacher_username: "John" },
  { post_id: 47465, action: "reply", teacher_username: "Mary" },
  { post_id: 47465, action: "edit", teacher_username: "Mary" },
  { post_id: 47467, action: "edit", teacher_username: "Mary" },
  { post_id: 46638, action: "reply", teacher_username: "Paul" },
];

const teacherSummary = [...d3.rollup(
  data,
  v => v.length,
  d => d.teacher_username,
  d => d.action
)].map(d => {
  return {
    teacher_username: d[0],
    num_edits: d[1].get("edit") || 0,
    num_replies: d[1].get("reply") || 0,
    edited_post_ids: data
      .filter(x => x.action === "edit" & x.teacher_username == d[0])
      .map(x => x.post_id)
  }
});
  
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

另一种方法是不使用 d3.rollup/d3.rollups 并使用 d3.groups instead. The source 因为 rollupgroup 都是对 nest 的调用顺便提一句。你失去了 rollup 为你做的计数,必须自己实施。这个例子读起来有点像 SQL 例子:

const data = [
  { post_id: 47469, action: "reply", teacher_username: "John" },
  { post_id: 47469, action: "edit", teacher_username: "John" },
  { post_id: 47468, action: "reply", teacher_username: "John" },
  { post_id: 47465, action: "reply", teacher_username: "Mary" },
  { post_id: 47465, action: "edit", teacher_username: "Mary" },
  { post_id: 47467, action: "edit", teacher_username: "Mary" },
  { post_id: 46638, action: "reply", teacher_username: "Paul" },
];

// compare with
// select
//   teacher_username,
//   count(case when action = 'reply' then 1 end) num_replies,
//   count(case when action = 'edit' then 1 end) num_edits,
//   array(case when action = 'edit' then post_id end) as 
// edited_post_ids
// from ?
// group by teacher_username

const teacherSummary = d3.groups(data, d => d.teacher_username)
  .map(k => {
    return {
      teacher_username: k[0],
      num_edits: k[1].filter(k2 => k2.action == "edit").length,
      num_replies: k[1].filter(k2 => k2.action == "reply").length,
      edited_post_ids: k[1].filter(k2 => k2.action == "edit").map(k3 => k3.post_id)
    }
  });
  
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

作为旁注,您可以将 postIdsByTeacher 归结为以下内容,并避免使用 new Set(etc) 类型的内容:

const data = [
  { post_id: 47469, action: "reply", teacher_username: "John" },
  { post_id: 47469, action: "edit", teacher_username: "John" },
  { post_id: 47468, action: "reply", teacher_username: "John" },
  { post_id: 47465, action: "reply", teacher_username: "Mary" },
  { post_id: 47465, action: "edit", teacher_username: "Mary" },
  { post_id: 47467, action: "edit", teacher_username: "Mary" },
  { post_id: 46638, action: "reply", teacher_username: "Paul" },
];

const postIdsByTeacher = d3.rollups(
  data.filter(d => d.action === "edit"),
  v => [].concat(v.map(k => k.post_id)),
  d => d.teacher_username
);

console.log(postIdsByTeacher);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>

但我的直觉是使用 d3.rollup 的价值在于当你想做标准的求和和计数这类事情时。

我最终简化了@Robin Mackenzie 的最后建议:

const uniq = require('lodash.uniq');
const teacherSummary = d3
  .groups(data, (d) => d.teacher_username)
  .map(([teacher_username, actions]) => {
    const edits = actions.filter((x) => x.action == "edit").map((x) => x.post_id);
    const replies = actions.filter((x) => x.action == "reply").map((x) => x.post_id);
    return {
      teacher_username,
      num_edits: edits.length,
      num_replies: replies.length,
      edited_post_ids: uniq(edits),
      replied_post_ids: uniq(replies),
    };
  })