R 中的重塑数据将长 table 更改为宽 table
Reshape data in R change a long table into a wide table
我想使用 R 中的 reshape2 包将我的长 table 变成宽 table。
我有一个来自数据库的数据集,它是这样的(示例):
id1 | id2 | info | action_time |
1 | a | info1 | time1 |
1 | a | info1 | time2 |
1 | a | info1 | time3 |
2 | b | info2 | time4 |
2 | b | info2 | time5 |
现在我希望它是这样的:
id1 | id2 | info |action_time 1|action_time 2|action_time 3|
1 | a | info1 | time1 | time2 | time3 |
2 | b | info2 | time4 | time5 | |
我尝试了几次,并在一些网站上使用 reshape()
或 dcast()
查找了一些示例,但找不到这样的示例。每个 id 的 action_time
的数量是不同的,对于某些 id,它们可能有超过 10 个 action_time
,所以在这种情况下,重塑的数据集将有超过 10 列 action_time
.
任何人都可以想出一个方便的方法吗?如果在 excel(Pivot Table?) 中有一种方法可以做到这一点,那也很棒。感谢堆
使用tidyr
require(tidyr)
# replicate data
df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c(" a ", " b "), class = "factor"),
info = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c(" info1 ",
" info2 "), class = "factor"), action_time = structure(1:5, .Label = c(" time1 ",
" time2 ", " time3 ", " time4 ", " time5 "
), class = "factor")), .Names = c("id1", "id2", "info", "action_time"
), class = "data.frame", row.names = c(NA, -5L))
# create additional column on action_time sequence
action_no <- paste("action_time",
unlist(sapply(rle(df$id1)$lengths, function(x) seq(1, x))))
y <- cbind(df, action_no)
# spread into final dataframe
z <- spread(y, action_no, action_time)
最终输出
> z
id1 id2 info action_time 1 action_time 2 action_time 3
1 1 a info1 time1 time2 time3
2 2 b info2 time4 time5 <NA>
尝试:
library(dplyr)
library(tidyr)
df %>%
group_by(id1) %>%
mutate(action_no = paste("action_time", row_number())) %>%
spread(action_no, action_time)
给出:
#Source: local data frame [2 x 6]
#
# id1 id2 info action_time 1 action_time 2 action_time 3
#1 1 a info1 time1 time2 time3
#2 2 b info2 time4 time5 NA
数据
df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), info = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("info1", "info2"), class = "factor"),
action_time = structure(1:5, .Label = c("time1", "time2",
"time3", "time4", "time5"), class = "factor")), .Names = c("id1",
"id2", "info", "action_time"), class = "data.frame", row.names = c(NA, -5L))
我想使用 R 中的 reshape2 包将我的长 table 变成宽 table。
我有一个来自数据库的数据集,它是这样的(示例):
id1 | id2 | info | action_time |
1 | a | info1 | time1 |
1 | a | info1 | time2 |
1 | a | info1 | time3 |
2 | b | info2 | time4 |
2 | b | info2 | time5 |
现在我希望它是这样的:
id1 | id2 | info |action_time 1|action_time 2|action_time 3|
1 | a | info1 | time1 | time2 | time3 |
2 | b | info2 | time4 | time5 | |
我尝试了几次,并在一些网站上使用 reshape()
或 dcast()
查找了一些示例,但找不到这样的示例。每个 id 的 action_time
的数量是不同的,对于某些 id,它们可能有超过 10 个 action_time
,所以在这种情况下,重塑的数据集将有超过 10 列 action_time
.
任何人都可以想出一个方便的方法吗?如果在 excel(Pivot Table?) 中有一种方法可以做到这一点,那也很棒。感谢堆
使用tidyr
require(tidyr)
# replicate data
df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c(" a ", " b "), class = "factor"),
info = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c(" info1 ",
" info2 "), class = "factor"), action_time = structure(1:5, .Label = c(" time1 ",
" time2 ", " time3 ", " time4 ", " time5 "
), class = "factor")), .Names = c("id1", "id2", "info", "action_time"
), class = "data.frame", row.names = c(NA, -5L))
# create additional column on action_time sequence
action_no <- paste("action_time",
unlist(sapply(rle(df$id1)$lengths, function(x) seq(1, x))))
y <- cbind(df, action_no)
# spread into final dataframe
z <- spread(y, action_no, action_time)
最终输出
> z
id1 id2 info action_time 1 action_time 2 action_time 3
1 1 a info1 time1 time2 time3
2 2 b info2 time4 time5 <NA>
尝试:
library(dplyr)
library(tidyr)
df %>%
group_by(id1) %>%
mutate(action_no = paste("action_time", row_number())) %>%
spread(action_no, action_time)
给出:
#Source: local data frame [2 x 6]
#
# id1 id2 info action_time 1 action_time 2 action_time 3
#1 1 a info1 time1 time2 time3
#2 2 b info2 time4 time5 NA
数据
df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), info = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("info1", "info2"), class = "factor"),
action_time = structure(1:5, .Label = c("time1", "time2",
"time3", "time4", "time5"), class = "factor")), .Names = c("id1",
"id2", "info", "action_time"), class = "data.frame", row.names = c(NA, -5L))