根据其他列中值的唯一组合分配新列

Assign new column based on unique combinations of values in other columns

我有一个鸟类观测记录数据集,大约30万行,7列。我想根据其他3列的唯一组合创建一个新列,它们都是因子变量- "gridref",记录所在的1km网格正方形; "observer",观察者和 "date",观察日期。我想为每个唯一的 "visit" 创建一个新列 "visit_ID" 到 1 公里的网格正方形 - 即 gridref、观察者和日期的每个唯一组合。

我尝试使用以下代码:

birds_raw$vid <- as.integer(interaction(birds_raw$gridref, birds_raw$observer, birds_raw$date))

此returns以下错误信息:

Error: cannot allocate vector of size 636.1 Gb
In addition: Warning message:
In ans * length(l) : NAs produced by integer overflow

我相信一定有一种简单的方法可以实现这一点。有人可以帮忙吗?

您可以使用 data.table 有效地做到这一点:

library(data.table)
birds_raw <-
  data.table(
    other_var = factor(c("other 1", "other 2", "other 3", "other 4")),
    gridref = factor(c("grid 1", "grid 2", "grid 1", "grid 1")),
    observer = factor(c("person 1", "person 2", "person 2", "person 1")),
    date = factor(c("date 1", "date 2", "date 1", "date 1"))
  )
birds_raw[, visit_id := .GRP, by = c("gridref", "observer", "date")][]