data.table 函数 "seq" -- "The RHS length must either be 1 or match the LHS length exactly" 中的错误是什么意思？

Question

我正在尝试：

计算警察单位之间通话时长的差异回应同一个电话
找出一组具有相同呼叫 ID 的呼叫中持续时间最长的呼叫
按持续时间降序排列

我的步骤可以在下面的代码片段中找到。

首先，我按 ID 降序排列（具有相同 ID 的多个呼叫），然后按通话时长（以小时为单位）进行排列（降序）。

然后，我把我的data.frame变成了data.table。

然后，按持续时间应用序列（降序）。

call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]

这就是问题所在：我收到一条错误消息

"Error in [.data.table(call_duration_diff_by_unit, , :=(duration_seq, : Supplied 2 items to be assigned to group 1 of size 1 in column 'duration_seq'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code."

我发现的这个错误的唯一解释是特定于我没有使用的独特包。我现在理解 "recycling" 的概念，但不确定它如何应用于这种情况...没有两个不同长度的向量。

R 是否将 by = c("ID") 部分错误地读取为第二个输入？

call_duration_diff_by_unit <- cad_cfs_data %>% 
  arrange(desc(ID), desc(CALL_DURATION_HOURS))

call_duration_diff_by_unit <- 
  data.table(call_duration_diff_by_unit)

call_duration_diff_by_unit[, duration_seq := seq(CALL_DURATION_HOURS), by = c("ID")]

我希望它为每组唯一的呼叫 ID 创建一个唯一的数字 ID（将 1 分配给最长的持续时间）。相反，我得到了错误，它没有保存变量 "duration_seq" 供以后在代码中使用。

Answer 1

我认为使用 data.table 中的特殊符号可以更轻松地完成您正在寻找的内容。 .N 非常有用，因为它只计算 data.table 中的行数，如果您指定一个组，它将计算该组中的行数。所以代码看起来像这样：

call_duration_diff_by_unit[, duration_seq := 1:.N, by = c("ID")]

这是你想要的吗？

data.table 函数 "seq" -- "The RHS length must either be 1 or match the LHS length exactly" 中的错误是什么意思？

What does the error mean in a data.table with the function "seq" -- "The RHS length must either be 1 or match the LHS length exactly"?

r

sequence

seq

data.table