使用 data.table 在每组数据后插入一行 NA
Insert a row of NAs after each group of data using data.table
我想在R
中的每组数据后添加一行NA。
之前有人问过类似的问题。 Insert a blank row after each group of data.
接受的答案在这种情况下也适用,如下所示。
group <- c("a","b","b","c","c","c","d","d","d","d")
xvalue <- c(16:25)
yvalue <- c(1:10)
df <- data.frame(cbind(group,xvalue,yvalue))
df_new <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
head(do.call(rbind, by(df_new, df$group, rbind, NA)), -1 )
group xvalue yvalue
a.1 a 16 1
a.2 <NA> <NA> <NA>
b.2 b 17 2
b.3 b 18 3
b.31 <NA> <NA> <NA>
c.4 c 19 4
c.5 c 20 5
c.6 c 21 6
c.41 <NA> <NA> <NA>
d.7 d 22 7
d.8 d 23 8
d.9 d 24 9
d.10 d 25 10
我怎样才能使用 data.table
来加快速度 data.frame?
你可以试试
df$group <- as.character(df$group)
setDT(df)[, .SD[1:(.N+1)], by=group][is.na(xvalue), group:=NA][!.N]
# group xvalue yvalue
#1: a 16 1
#2: NA NA NA
#3: b 17 2
#4: b 18 3
#5: NA NA NA
#6: c 19 4
#7: c 20 5
#8: c 21 6
#9: NA NA NA
#10: d 22 7
#11: d 23 8
#12: d 24 9
#13: d 25 10
或者按照@David Arenburg 的建议
setDT(df)[, indx := group][, .SD[1:(.N+1)], indx][,indx := NULL][!.N]
或者
setDT(df)[df[,.I[1:(.N+1)], group]$V1][!.N]
或者可以根据@eddi 的评论进一步简化
setDT(df)[df[, c(.I, NA), group]$V1][!.N]
我能想到的一种方法是先构造一个向量,如下所示:
foo <- function(x) {
o = order(rep.int(seq_along(x), 2L))
c(x, rep.int(NA, length(x)))[o]
}
join_values = head(foo(unique(df_new$group)), -1L)
# [1] "a" NA "b" NA "c" NA "d"
然后是 setkey()
和 join
。
setkey(setDT(df_new), group)
df_new[.(join_values), allow.cartesian=TRUE]
# group xvalue yvalue
# 1: a 16 1
# 2: NA NA NA
# 3: b 17 2
# 4: b 18 3
# 5: NA NA NA
# 6: c 19 4
# 7: c 20 5
# 8: c 21 6
# 9: NA NA NA
# 10: d 22 7
# 11: d 23 8
# 12: d 24 9
# 13: d 25 10
我想在R
中的每组数据后添加一行NA。
之前有人问过类似的问题。 Insert a blank row after each group of data.
接受的答案在这种情况下也适用,如下所示。
group <- c("a","b","b","c","c","c","d","d","d","d")
xvalue <- c(16:25)
yvalue <- c(1:10)
df <- data.frame(cbind(group,xvalue,yvalue))
df_new <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
head(do.call(rbind, by(df_new, df$group, rbind, NA)), -1 )
group xvalue yvalue
a.1 a 16 1
a.2 <NA> <NA> <NA>
b.2 b 17 2
b.3 b 18 3
b.31 <NA> <NA> <NA>
c.4 c 19 4
c.5 c 20 5
c.6 c 21 6
c.41 <NA> <NA> <NA>
d.7 d 22 7
d.8 d 23 8
d.9 d 24 9
d.10 d 25 10
我怎样才能使用 data.table
来加快速度 data.frame?
你可以试试
df$group <- as.character(df$group)
setDT(df)[, .SD[1:(.N+1)], by=group][is.na(xvalue), group:=NA][!.N]
# group xvalue yvalue
#1: a 16 1
#2: NA NA NA
#3: b 17 2
#4: b 18 3
#5: NA NA NA
#6: c 19 4
#7: c 20 5
#8: c 21 6
#9: NA NA NA
#10: d 22 7
#11: d 23 8
#12: d 24 9
#13: d 25 10
或者按照@David Arenburg 的建议
setDT(df)[, indx := group][, .SD[1:(.N+1)], indx][,indx := NULL][!.N]
或者
setDT(df)[df[,.I[1:(.N+1)], group]$V1][!.N]
或者可以根据@eddi 的评论进一步简化
setDT(df)[df[, c(.I, NA), group]$V1][!.N]
我能想到的一种方法是先构造一个向量,如下所示:
foo <- function(x) {
o = order(rep.int(seq_along(x), 2L))
c(x, rep.int(NA, length(x)))[o]
}
join_values = head(foo(unique(df_new$group)), -1L)
# [1] "a" NA "b" NA "c" NA "d"
然后是 setkey()
和 join
。
setkey(setDT(df_new), group)
df_new[.(join_values), allow.cartesian=TRUE]
# group xvalue yvalue
# 1: a 16 1
# 2: NA NA NA
# 3: b 17 2
# 4: b 18 3
# 5: NA NA NA
# 6: c 19 4
# 7: c 20 5
# 8: c 21 6
# 9: NA NA NA
# 10: d 22 7
# 11: d 23 8
# 12: d 24 9
# 13: d 25 10