重塑数据 table
Reshape data table
我有一个数据table喜欢(数据不一定按'col1'排序)
col0 col1 col2
1: abc 1 a
2: abc 2 b
3: abc 3 c
4: abc 4 d
5: abc 5 e
6: def 1 a
7: def 2 b
8: def 3 c
9: def 4 d
10: def 5 e
我想按以下方式重塑它
col0 col1 col2 new_1 new_2 new_3 new_4
1: abc 1 a NA NA NA NA
2: abc 2 b a NA NA NA
3: abc 3 c b a NA NA
4: abc 4 d c b a NA
5: abc 5 e d c b a
6: def 1 a NA NA NA NA
7: def 2 b a NA NA NA
8: def 3 c b a NA NA
9: def 4 d c b a NA
10: def 5 e d c b a
基本上我想为上面同一行中的每一行获取以前出现的 col2 值,如果有 none,相应的新列应该说 NA。
我当然可以通过在 col2 上合并 5 次来做到这一点,但是我需要在一个大的 table 上这样做(在那种情况下我将不得不合并 20-30 次)。
在 R 中用 1 行或 2 行实现它的最佳方法是什么?
我们可以使用 data.table 开发版本的 shift
,即 v1.9.5
(安装开发版本的说明是 here
。默认情况下,type
in shift
是 lag
。我们可以将 n
指定为向量,在本例中为 1:4
。我们将输出分配 (:=
) 到新列.
library(data.table)#v1.9.5+
DT[, paste('new', 1:4, sep="_") := shift(col2, 1:4)]
DT
# col1 col2 new_1 new_2 new_3 new_4
#1: 1 a NA NA NA NA
#2: 2 b a NA NA NA
#3: 3 c b a NA NA
#4: 4 d c b a NA
#5: 5 e d c b a
对于新数据集 'DT2',我们需要按 'col0' 分组,然后在 'col2'
上执行 shift
DT2[, paste('new', 1:4, sep="_") := shift(col2, 1:4), by = col0]
DT2
# col0 col1 col2 new_1 new_2 new_3 new_4
# 1: abc 1 a NA NA NA NA
# 2: abc 2 b a NA NA NA
# 3: abc 3 c b a NA NA
# 4: abc 4 d c b a NA
# 5: abc 5 e d c b a
# 6: def 1 a NA NA NA NA
# 7: def 2 b a NA NA NA
# 8: def 3 c b a NA NA
# 9: def 4 d c b a NA
#10: def 5 e d c b a
数据
df1 <- structure(list(col1 = 1:5, col2 = c("a", "b", "c", "d", "e"),
new_1 = c(NA, "a", "b", "c", "d"), new_2 = c(NA, NA, "a",
"b", "c"), new_3 = c(NA, NA, NA, "a", "b"), new_4 = c(NA,
NA, NA, NA, "a")), .Names = c("col1", "col2", "new_1", "new_2",
"new_3", "new_4"), class = "data.frame", row.names = c(NA, -5L
))
DT <- as.data.table(df1)
df2 <- structure(list(col0 = c("abc", "abc", "abc", "abc", "abc",
"def",
"def", "def", "def", "def"), col1 = c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L), col2 = c("a", "b", "c", "d", "e", "a", "b",
"c", "d", "e")), .Names = c("col0", "col1", "col2"),
class = "data.frame", row.names = c(NA, -10L))
DT2 <- as.data.table(df2)
我有一个数据table喜欢(数据不一定按'col1'排序)
col0 col1 col2
1: abc 1 a
2: abc 2 b
3: abc 3 c
4: abc 4 d
5: abc 5 e
6: def 1 a
7: def 2 b
8: def 3 c
9: def 4 d
10: def 5 e
我想按以下方式重塑它
col0 col1 col2 new_1 new_2 new_3 new_4
1: abc 1 a NA NA NA NA
2: abc 2 b a NA NA NA
3: abc 3 c b a NA NA
4: abc 4 d c b a NA
5: abc 5 e d c b a
6: def 1 a NA NA NA NA
7: def 2 b a NA NA NA
8: def 3 c b a NA NA
9: def 4 d c b a NA
10: def 5 e d c b a
基本上我想为上面同一行中的每一行获取以前出现的 col2 值,如果有 none,相应的新列应该说 NA。
我当然可以通过在 col2 上合并 5 次来做到这一点,但是我需要在一个大的 table 上这样做(在那种情况下我将不得不合并 20-30 次)。
在 R 中用 1 行或 2 行实现它的最佳方法是什么?
我们可以使用 data.table 开发版本的 shift
,即 v1.9.5
(安装开发版本的说明是 here
。默认情况下,type
in shift
是 lag
。我们可以将 n
指定为向量,在本例中为 1:4
。我们将输出分配 (:=
) 到新列.
library(data.table)#v1.9.5+
DT[, paste('new', 1:4, sep="_") := shift(col2, 1:4)]
DT
# col1 col2 new_1 new_2 new_3 new_4
#1: 1 a NA NA NA NA
#2: 2 b a NA NA NA
#3: 3 c b a NA NA
#4: 4 d c b a NA
#5: 5 e d c b a
对于新数据集 'DT2',我们需要按 'col0' 分组,然后在 'col2'
上执行shift
DT2[, paste('new', 1:4, sep="_") := shift(col2, 1:4), by = col0]
DT2
# col0 col1 col2 new_1 new_2 new_3 new_4
# 1: abc 1 a NA NA NA NA
# 2: abc 2 b a NA NA NA
# 3: abc 3 c b a NA NA
# 4: abc 4 d c b a NA
# 5: abc 5 e d c b a
# 6: def 1 a NA NA NA NA
# 7: def 2 b a NA NA NA
# 8: def 3 c b a NA NA
# 9: def 4 d c b a NA
#10: def 5 e d c b a
数据
df1 <- structure(list(col1 = 1:5, col2 = c("a", "b", "c", "d", "e"),
new_1 = c(NA, "a", "b", "c", "d"), new_2 = c(NA, NA, "a",
"b", "c"), new_3 = c(NA, NA, NA, "a", "b"), new_4 = c(NA,
NA, NA, NA, "a")), .Names = c("col1", "col2", "new_1", "new_2",
"new_3", "new_4"), class = "data.frame", row.names = c(NA, -5L
))
DT <- as.data.table(df1)
df2 <- structure(list(col0 = c("abc", "abc", "abc", "abc", "abc",
"def",
"def", "def", "def", "def"), col1 = c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L), col2 = c("a", "b", "c", "d", "e", "a", "b",
"c", "d", "e")), .Names = c("col0", "col1", "col2"),
class = "data.frame", row.names = c(NA, -10L))
DT2 <- as.data.table(df2)