在 data.table 中创建子组
Create subgroups in data.table
假设我有以下简化数据集:
dt <- data.table(id = 1:5, val = c(1, 2, 3, 2, 4))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))
我想替换 dt
中所有值为 2 的值。替换值在 dt2 中给出。这两个表可以通过 id 连接。
如果值不等于 2,则最终值应保持不变。如果等于 2,则应变为 paste0(dt$val, ".", dt2$val)
.
期望输出:
row id val
1: 1 1
2: 2 2.2
3: 3 3
4: 4 2.3
5: 5 4
我试过的(有效但似乎不够优雅):
merged <- merge(x = dt, y = dt2, by= "id", all.x = TRUE)
merged[!is.na(merged$val.y), ]$val.x <- paste0(
merged[!is.na(merged$val.y), ]$val.x, ".",
merged[!is.na(merged$val.y), ]$val.y)
merged[, val.y := NULL]
setnames(x = merged, old = "val.x", new = "val")
merged
问题:如何更优雅地进行转换?
library(data.table)
# example data
dt <- data.table(id = 1:5, val = c(1, 2, 3, 2, 4))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))
如果你的数据集都是正确排序的,你可以像这样使用 base R:
dt$val[dt$id %in% dt2$id] = paste0(dt$val[dt$id %in% dt2$id], ".", dt2$val)
dt
# id val
# 1: 1 1
# 2: 2 2.2
# 3: 3 3
# 4: 4 2.3
# 5: 5 4
否则你可以使用这个:
dt_merged = merge(dt, dt2, by="id", all.x=T)[, val:=ifelse(is.na(val.y),
val.x,
paste0(val.x, ".", val.y))]
dt_merged = dt_merged[, c("id","val")]
dt_merged
# id val
# 1: 1 1
# 2: 2 2.2
# 3: 3 3
# 4: 4 2.3
# 5: 5 4
你正在寻找更新加入
dt[dt2, on=.(id), val := paste0(x.val, ".", i.val)]
输出:
id val
1: 1 1
2: 2 2.2
3: 3 3
4: 4 2.3
5: 5 4
数据:
#val column needs to be of character type to suppress the warning
dt <- data.table(id = 1:5, val = as.character(c(1, 2, 3, 2, 4)))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))
假设我有以下简化数据集:
dt <- data.table(id = 1:5, val = c(1, 2, 3, 2, 4))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))
我想替换 dt
中所有值为 2 的值。替换值在 dt2 中给出。这两个表可以通过 id 连接。
如果值不等于 2,则最终值应保持不变。如果等于 2,则应变为 paste0(dt$val, ".", dt2$val)
.
期望输出:
row id val
1: 1 1
2: 2 2.2
3: 3 3
4: 4 2.3
5: 5 4
我试过的(有效但似乎不够优雅):
merged <- merge(x = dt, y = dt2, by= "id", all.x = TRUE)
merged[!is.na(merged$val.y), ]$val.x <- paste0(
merged[!is.na(merged$val.y), ]$val.x, ".",
merged[!is.na(merged$val.y), ]$val.y)
merged[, val.y := NULL]
setnames(x = merged, old = "val.x", new = "val")
merged
问题:如何更优雅地进行转换?
library(data.table)
# example data
dt <- data.table(id = 1:5, val = c(1, 2, 3, 2, 4))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))
如果你的数据集都是正确排序的,你可以像这样使用 base R:
dt$val[dt$id %in% dt2$id] = paste0(dt$val[dt$id %in% dt2$id], ".", dt2$val)
dt
# id val
# 1: 1 1
# 2: 2 2.2
# 3: 3 3
# 4: 4 2.3
# 5: 5 4
否则你可以使用这个:
dt_merged = merge(dt, dt2, by="id", all.x=T)[, val:=ifelse(is.na(val.y),
val.x,
paste0(val.x, ".", val.y))]
dt_merged = dt_merged[, c("id","val")]
dt_merged
# id val
# 1: 1 1
# 2: 2 2.2
# 3: 3 3
# 4: 4 2.3
# 5: 5 4
你正在寻找更新加入
dt[dt2, on=.(id), val := paste0(x.val, ".", i.val)]
输出:
id val
1: 1 1
2: 2 2.2
3: 3 3
4: 4 2.3
5: 5 4
数据:
#val column needs to be of character type to suppress the warning
dt <- data.table(id = 1:5, val = as.character(c(1, 2, 3, 2, 4)))
dt2 <- data.table(id = c(2, 4), val = c(2, 3))