用 data.table 进行更动态的熔化
more dynamic melting with data.table
我正在寻找最有效的形式进行转换
ARTNR FILGRP
1 1 9827
2 2 9348
3 3 9335, 9827, 9339
进入这个
ARTNR FILGRP
1 1 9827
2 2 9348
3 3 9335
4 3 9827
5 3 9339
我尝试了下面的代码,它可以工作,但它并不优雅,并且有一些缺点。 :
setDT(artnrs)
artnrs[, c("P1", "P2", "P3") := tstrsplit(FILGRP, ",", fixed=TRUE)] # 1)
artnrs <- melt(artnrs, c("ARTNR"), measure = patterns("^P")) # 2)
artnrs[,variable:=NULL] # 3)
artnrs <- na.omit(artnrs, cols="value") # 4)
names(artnrs)[2] <- "FILGRP" # 5)
- ad 1) 将最后一列拆分为三个新列。我怎样才能使它充满活力并适合五人或十人?
- ad 2-5) 相当笨拙的操作,我可以更好地链接它吗?
它基于 data.table
但性能并不是那么关键,因此易于理解的 tidyverse
解决方案就可以了。不过包越少越好。
谢谢!
dput
输出;
structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")),
row.names = c(NA, -3L), class = "data.frame")
df <- structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")),
row.names = c(NA, -3L), class = "data.frame")
df2 <- strsplit(df$FILGRP, split = ",")
df2 <- data.frame(ARTNR = rep(df$ARTNR, sapply(df2, length)), FILGRP = unlist(df2))
这里有一个data.table
方法
library( data.table )
setDT(DT)
melt( DT[, paste0( "v", 1:length(tstrsplit( DT$FILGRP, ", ") ) ) := tstrsplit( FILGRP, ", ") ],
id.vars = "ARTNR",
measure.vars = patterns( "^v" ),
value.name = "FILGRP" )[!is.na(FILGRP), .SD, .SDcols = c(1,3) ]
# ARTNR FILGRP
# 1: 1 9827
# 2: 2 9348
# 3: 3 9335
# 4: 3 9827
# 5: 3 9339
我正在寻找最有效的形式进行转换
ARTNR FILGRP
1 1 9827
2 2 9348
3 3 9335, 9827, 9339
进入这个
ARTNR FILGRP
1 1 9827
2 2 9348
3 3 9335
4 3 9827
5 3 9339
我尝试了下面的代码,它可以工作,但它并不优雅,并且有一些缺点。 :
setDT(artnrs)
artnrs[, c("P1", "P2", "P3") := tstrsplit(FILGRP, ",", fixed=TRUE)] # 1)
artnrs <- melt(artnrs, c("ARTNR"), measure = patterns("^P")) # 2)
artnrs[,variable:=NULL] # 3)
artnrs <- na.omit(artnrs, cols="value") # 4)
names(artnrs)[2] <- "FILGRP" # 5)
- ad 1) 将最后一列拆分为三个新列。我怎样才能使它充满活力并适合五人或十人?
- ad 2-5) 相当笨拙的操作,我可以更好地链接它吗?
它基于 data.table
但性能并不是那么关键,因此易于理解的 tidyverse
解决方案就可以了。不过包越少越好。
谢谢!
dput
输出;
structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")),
row.names = c(NA, -3L), class = "data.frame")
df <- structure(list(ARTNR = c(1, 2, 3), FILGRP = c("9827", "9348", "9335, 9827, 9339")),
row.names = c(NA, -3L), class = "data.frame")
df2 <- strsplit(df$FILGRP, split = ",")
df2 <- data.frame(ARTNR = rep(df$ARTNR, sapply(df2, length)), FILGRP = unlist(df2))
这里有一个data.table
方法
library( data.table )
setDT(DT)
melt( DT[, paste0( "v", 1:length(tstrsplit( DT$FILGRP, ", ") ) ) := tstrsplit( FILGRP, ", ") ],
id.vars = "ARTNR",
measure.vars = patterns( "^v" ),
value.name = "FILGRP" )[!is.na(FILGRP), .SD, .SDcols = c(1,3) ]
# ARTNR FILGRP
# 1: 1 9827
# 2: 2 9348
# 3: 3 9335
# 4: 3 9827
# 5: 3 9339