如何使用 d3 数组中的组和汇总汇总数组?
How to summarize an array with group and rollup from d3-array?
我正在尝试使用 d3-array 生成对象数组的两个摘要:
- 每位老师执行了哪些操作?
- 每位老师编辑了哪些帖子?
这是我目前的做法:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const teacherSummary = [
...d3.rollup(
data,
(x) => x.length,
(d) => d.teacher_username,
(d) => d.action
),
]
.map((x) => {
return {
teacher_username: x[0],
num_edits: x[1].get("edit") || 0,
num_replies: x[1].get("reply") || 0,
};
})
.sort((a, b) => d3.descending(a.num_edits, b.num_edits));
// [
// { "teacher_username": "Mary", "num_edits": 2, "num_replies": 1 },
// { "teacher_username": "John", "num_edits": 1, "num_replies": 2 },
// { "teacher_username": "Paul", "num_edits": 0, "num_replies": 1 }
// ]
const postIdsByTeacher = d3.rollups(
data.filter((x) => x.action === "edit"),
(v) => [...new Set(v.map((d) => d.post_id))].join(", "), // Set() is used to get rid of duplicate post_ids
(d) => d.teacher_username
);
// [
// ["John","47469"],
// ["Mary","47465, 47467"]
// ]
我对输出格式很灵活。我想要优化的是效率和清晰度:
- 我可以在一次
rollup
调用中获得两个摘要吗?也许通过将 edited_post_ids
添加到 teacherSummary
.
- 似乎应该有更优雅的方法来替换
[...Map/Set]
调用
编辑:出于好奇,我也使用 alasql 尝试了这种方法。除了 edited_post_ids
中的空值外,它几乎可以工作。
sql = alasql(`
select
teacher_username,
count(case when action = 'reply' then 1 end) num_replies,
count(case when action = 'edit' then 1 end) num_edits,
array(case when action = 'edit' then post_id end) as edited_post_ids
from ?
group by teacher_username
`, [data])
// [
// { teacher_username: "John", num_replies: 2, num_edits: 1, edited_post_ids: [null, 47469, null], },
// { teacher_username: "Mary", num_replies: 1, num_edits: 2, edited_post_ids: [null, 47465, 47467], },
// { teacher_username: "Paul", num_replies: 1, num_edits: 0, edited_post_ids: [null], },
// ];
d3.rollup
的函数签名是:
d3.rollup(可迭代,减少,...键)
从表面上看,您可以在 reduce
中提供一项操作,例如计数或求和或其他一些操作 - 但只有一个。
对于您的输出,您正在寻找两种不同的操作
- 统计回复和编辑,
- 一个数组运算得到
post_id
s where action == "edit"
一旦您选择使用 (x) => x.length
,您就失去了使用其他 reduce
操作的机会。可以说d3.rollup
不是你多操作需要的功能吗?
您仍然可以将 edited_post_ids
添加到 teacherSummary
,只需返回原始数据并应用 filter
然后 map
:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const teacherSummary = [...d3.rollup(
data,
v => v.length,
d => d.teacher_username,
d => d.action
)].map(d => {
return {
teacher_username: d[0],
num_edits: d[1].get("edit") || 0,
num_replies: d[1].get("reply") || 0,
edited_post_ids: data
.filter(x => x.action === "edit" & x.teacher_username == d[0])
.map(x => x.post_id)
}
});
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
另一种方法是不使用 d3.rollup
/d3.rollups
并使用 d3.groups
instead. The source 因为 rollup
和 group
都是对 nest
的调用顺便提一句。你失去了 rollup
为你做的计数,必须自己实施。这个例子读起来有点像 SQL 例子:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
// compare with
// select
// teacher_username,
// count(case when action = 'reply' then 1 end) num_replies,
// count(case when action = 'edit' then 1 end) num_edits,
// array(case when action = 'edit' then post_id end) as
// edited_post_ids
// from ?
// group by teacher_username
const teacherSummary = d3.groups(data, d => d.teacher_username)
.map(k => {
return {
teacher_username: k[0],
num_edits: k[1].filter(k2 => k2.action == "edit").length,
num_replies: k[1].filter(k2 => k2.action == "reply").length,
edited_post_ids: k[1].filter(k2 => k2.action == "edit").map(k3 => k3.post_id)
}
});
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
作为旁注,您可以将 postIdsByTeacher
归结为以下内容,并避免使用 new Set(etc)
类型的内容:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const postIdsByTeacher = d3.rollups(
data.filter(d => d.action === "edit"),
v => [].concat(v.map(k => k.post_id)),
d => d.teacher_username
);
console.log(postIdsByTeacher);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
但我的直觉是使用 d3.rollup
的价值在于当你想做标准的求和和计数这类事情时。
我最终简化了@Robin Mackenzie 的最后建议:
const uniq = require('lodash.uniq');
const teacherSummary = d3
.groups(data, (d) => d.teacher_username)
.map(([teacher_username, actions]) => {
const edits = actions.filter((x) => x.action == "edit").map((x) => x.post_id);
const replies = actions.filter((x) => x.action == "reply").map((x) => x.post_id);
return {
teacher_username,
num_edits: edits.length,
num_replies: replies.length,
edited_post_ids: uniq(edits),
replied_post_ids: uniq(replies),
};
})
我正在尝试使用 d3-array 生成对象数组的两个摘要:
- 每位老师执行了哪些操作?
- 每位老师编辑了哪些帖子?
这是我目前的做法:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const teacherSummary = [
...d3.rollup(
data,
(x) => x.length,
(d) => d.teacher_username,
(d) => d.action
),
]
.map((x) => {
return {
teacher_username: x[0],
num_edits: x[1].get("edit") || 0,
num_replies: x[1].get("reply") || 0,
};
})
.sort((a, b) => d3.descending(a.num_edits, b.num_edits));
// [
// { "teacher_username": "Mary", "num_edits": 2, "num_replies": 1 },
// { "teacher_username": "John", "num_edits": 1, "num_replies": 2 },
// { "teacher_username": "Paul", "num_edits": 0, "num_replies": 1 }
// ]
const postIdsByTeacher = d3.rollups(
data.filter((x) => x.action === "edit"),
(v) => [...new Set(v.map((d) => d.post_id))].join(", "), // Set() is used to get rid of duplicate post_ids
(d) => d.teacher_username
);
// [
// ["John","47469"],
// ["Mary","47465, 47467"]
// ]
我对输出格式很灵活。我想要优化的是效率和清晰度:
- 我可以在一次
rollup
调用中获得两个摘要吗?也许通过将edited_post_ids
添加到teacherSummary
. - 似乎应该有更优雅的方法来替换
[...Map/Set]
调用
编辑:出于好奇,我也使用 alasql 尝试了这种方法。除了 edited_post_ids
中的空值外,它几乎可以工作。
sql = alasql(`
select
teacher_username,
count(case when action = 'reply' then 1 end) num_replies,
count(case when action = 'edit' then 1 end) num_edits,
array(case when action = 'edit' then post_id end) as edited_post_ids
from ?
group by teacher_username
`, [data])
// [
// { teacher_username: "John", num_replies: 2, num_edits: 1, edited_post_ids: [null, 47469, null], },
// { teacher_username: "Mary", num_replies: 1, num_edits: 2, edited_post_ids: [null, 47465, 47467], },
// { teacher_username: "Paul", num_replies: 1, num_edits: 0, edited_post_ids: [null], },
// ];
d3.rollup
的函数签名是:
d3.rollup(可迭代,减少,...键)
从表面上看,您可以在 reduce
中提供一项操作,例如计数或求和或其他一些操作 - 但只有一个。
对于您的输出,您正在寻找两种不同的操作
- 统计回复和编辑,
- 一个数组运算得到
post_id
s whereaction == "edit"
一旦您选择使用 (x) => x.length
,您就失去了使用其他 reduce
操作的机会。可以说d3.rollup
不是你多操作需要的功能吗?
您仍然可以将 edited_post_ids
添加到 teacherSummary
,只需返回原始数据并应用 filter
然后 map
:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const teacherSummary = [...d3.rollup(
data,
v => v.length,
d => d.teacher_username,
d => d.action
)].map(d => {
return {
teacher_username: d[0],
num_edits: d[1].get("edit") || 0,
num_replies: d[1].get("reply") || 0,
edited_post_ids: data
.filter(x => x.action === "edit" & x.teacher_username == d[0])
.map(x => x.post_id)
}
});
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
另一种方法是不使用 d3.rollup
/d3.rollups
并使用 d3.groups
instead. The source 因为 rollup
和 group
都是对 nest
的调用顺便提一句。你失去了 rollup
为你做的计数,必须自己实施。这个例子读起来有点像 SQL 例子:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
// compare with
// select
// teacher_username,
// count(case when action = 'reply' then 1 end) num_replies,
// count(case when action = 'edit' then 1 end) num_edits,
// array(case when action = 'edit' then post_id end) as
// edited_post_ids
// from ?
// group by teacher_username
const teacherSummary = d3.groups(data, d => d.teacher_username)
.map(k => {
return {
teacher_username: k[0],
num_edits: k[1].filter(k2 => k2.action == "edit").length,
num_replies: k[1].filter(k2 => k2.action == "reply").length,
edited_post_ids: k[1].filter(k2 => k2.action == "edit").map(k3 => k3.post_id)
}
});
console.log(teacherSummary);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
作为旁注,您可以将 postIdsByTeacher
归结为以下内容,并避免使用 new Set(etc)
类型的内容:
const data = [
{ post_id: 47469, action: "reply", teacher_username: "John" },
{ post_id: 47469, action: "edit", teacher_username: "John" },
{ post_id: 47468, action: "reply", teacher_username: "John" },
{ post_id: 47465, action: "reply", teacher_username: "Mary" },
{ post_id: 47465, action: "edit", teacher_username: "Mary" },
{ post_id: 47467, action: "edit", teacher_username: "Mary" },
{ post_id: 46638, action: "reply", teacher_username: "Paul" },
];
const postIdsByTeacher = d3.rollups(
data.filter(d => d.action === "edit"),
v => [].concat(v.map(k => k.post_id)),
d => d.teacher_username
);
console.log(postIdsByTeacher);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.0.0/d3.min.js"></script>
但我的直觉是使用 d3.rollup
的价值在于当你想做标准的求和和计数这类事情时。
我最终简化了@Robin Mackenzie 的最后建议:
const uniq = require('lodash.uniq');
const teacherSummary = d3
.groups(data, (d) => d.teacher_username)
.map(([teacher_username, actions]) => {
const edits = actions.filter((x) => x.action == "edit").map((x) => x.post_id);
const replies = actions.filter((x) => x.action == "reply").map((x) => x.post_id);
return {
teacher_username,
num_edits: edits.length,
num_replies: replies.length,
edited_post_ids: uniq(edits),
replied_post_ids: uniq(replies),
};
})