重塑数据 table

Reshape data table

我有一个数据table喜欢(数据不一定按'col1'排序)

    col0    col1      col2
1:  abc       1         a
2:  abc       2         b 
3:  abc       3         c 
4:  abc       4         d 
5:  abc       5         e
6:  def       1         a
7:  def       2         b 
8:  def       3         c 
9:  def       4         d 
10: def       5         e

我想按以下方式重塑它

    col0      col1      col2      new_1   new_2   new_3   new_4
1:  abc         1         a         NA      NA       NA      NA
2:  abc         2         b         a       NA       NA      NA
3:  abc         3         c         b       a        NA      NA
4:  abc         4         d         c       b        a       NA 
5:  abc         5         e         d       c        b       a
6:  def         1         a         NA      NA       NA      NA
7:  def         2         b         a       NA       NA      NA
8:  def         3         c         b       a        NA      NA
9:  def         4         d         c       b        a       NA 
10: def         5         e         d       c        b       a

基本上我想为上面同一行中的每一行获取以前出现的 col2 值,如果有 none,相应的新列应该说 NA。

我当然可以通过在 col2 上合并 5 次来做到这一点,但是我需要在一个大的 table 上这样做(在那种情况下我将不得不合并 20-30 次)。

在 R 中用 1 行或 2 行实现它的最佳方法是什么?

我们可以使用 data.table 开发版本的 shift,即 v1.9.5(安装开发版本的说明是 here。默认情况下,type in shiftlag。我们可以将 n 指定为向量,在本例中为 1:4。我们将输出分配 (:=) 到新列.

library(data.table)#v1.9.5+
DT[, paste('new', 1:4, sep="_") := shift(col2, 1:4)]
DT
#   col1 col2 new_1 new_2 new_3 new_4
#1:    1    a    NA    NA    NA    NA
#2:    2    b     a    NA    NA    NA
#3:    3    c     b     a    NA    NA
#4:    4    d     c     b     a    NA
#5:    5    e     d     c     b     a

对于新数据集 'DT2',我们需要按 'col0' 分组,然后在 'col2'

上执行 shift
DT2[, paste('new', 1:4, sep="_") := shift(col2, 1:4), by = col0]
DT2
#   col0 col1 col2 new_1 new_2 new_3 new_4
# 1:  abc    1    a    NA    NA    NA    NA
# 2:  abc    2    b     a    NA    NA    NA
# 3:  abc    3    c     b     a    NA    NA
# 4:  abc    4    d     c     b     a    NA
# 5:  abc    5    e     d     c     b     a
# 6:  def    1    a    NA    NA    NA    NA
# 7:  def    2    b     a    NA    NA    NA
# 8:  def    3    c     b     a    NA    NA
# 9:  def    4    d     c     b     a    NA
#10:  def    5    e     d     c     b     a

数据

df1 <- structure(list(col1 = 1:5, col2 = c("a", "b", "c", "d", "e"), 
new_1 = c(NA, "a", "b", "c", "d"), new_2 = c(NA, NA, "a", 
"b", "c"), new_3 = c(NA, NA, NA, "a", "b"), new_4 = c(NA, 
NA, NA, NA, "a")), .Names = c("col1", "col2", "new_1", "new_2", 
"new_3", "new_4"), class = "data.frame", row.names = c(NA, -5L
))

DT <- as.data.table(df1)

df2 <- structure(list(col0 = c("abc", "abc", "abc", "abc", "abc", 
"def", 
"def", "def", "def", "def"), col1 = c(1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L), col2 = c("a", "b", "c", "d", "e", "a", "b", 
 "c", "d", "e")), .Names = c("col0", "col1", "col2"), 
class = "data.frame", row.names = c(NA, -10L))
DT2 <- as.data.table(df2)