根据其他列中值的唯一组合分配新列
Assign new column based on unique combinations of values in other columns
我有一个鸟类观测记录数据集,大约30万行,7列。我想根据其他3列的唯一组合创建一个新列,它们都是因子变量- "gridref",记录所在的1km网格正方形; "observer",观察者和 "date",观察日期。我想为每个唯一的 "visit" 创建一个新列 "visit_ID" 到 1 公里的网格正方形 - 即 gridref、观察者和日期的每个唯一组合。
我尝试使用以下代码:
birds_raw$vid <- as.integer(interaction(birds_raw$gridref, birds_raw$observer, birds_raw$date))
此returns以下错误信息:
Error: cannot allocate vector of size 636.1 Gb
In addition: Warning message:
In ans * length(l) : NAs produced by integer overflow
我相信一定有一种简单的方法可以实现这一点。有人可以帮忙吗?
您可以使用 data.table
有效地做到这一点:
library(data.table)
birds_raw <-
data.table(
other_var = factor(c("other 1", "other 2", "other 3", "other 4")),
gridref = factor(c("grid 1", "grid 2", "grid 1", "grid 1")),
observer = factor(c("person 1", "person 2", "person 2", "person 1")),
date = factor(c("date 1", "date 2", "date 1", "date 1"))
)
birds_raw[, visit_id := .GRP, by = c("gridref", "observer", "date")][]
我有一个鸟类观测记录数据集,大约30万行,7列。我想根据其他3列的唯一组合创建一个新列,它们都是因子变量- "gridref",记录所在的1km网格正方形; "observer",观察者和 "date",观察日期。我想为每个唯一的 "visit" 创建一个新列 "visit_ID" 到 1 公里的网格正方形 - 即 gridref、观察者和日期的每个唯一组合。
我尝试使用以下代码:
birds_raw$vid <- as.integer(interaction(birds_raw$gridref, birds_raw$observer, birds_raw$date))
此returns以下错误信息:
Error: cannot allocate vector of size 636.1 Gb
In addition: Warning message:
In ans * length(l) : NAs produced by integer overflow
我相信一定有一种简单的方法可以实现这一点。有人可以帮忙吗?
您可以使用 data.table
有效地做到这一点:
library(data.table)
birds_raw <-
data.table(
other_var = factor(c("other 1", "other 2", "other 3", "other 4")),
gridref = factor(c("grid 1", "grid 2", "grid 1", "grid 1")),
observer = factor(c("person 1", "person 2", "person 2", "person 1")),
date = factor(c("date 1", "date 2", "date 1", "date 1"))
)
birds_raw[, visit_id := .GRP, by = c("gridref", "observer", "date")][]