使用/字符串连接熔化和 dcast
Melt & dcast w/ string concatenation
假设我有以下 data.frame:
foo <- data.frame(CONTACT_DATE = c(rep(as.Date("2015-09-15"),3), rep(as.Date("2015-09-16"),3)), ISSUE = c("abc", "def", "xyz", "abc", "xyz", "def"), ISSUE_COUNT = c(1000,750,100,1500,200,100), RANK = c(1,2,3,1,2,3))
> foo
CONTACT_DATE ISSUE ISSUE_COUNT RANK
1 2015-09-15 abc 1000 1
2 2015-09-15 def 750 2
3 2015-09-15 xyz 100 3
4 2015-09-16 abc 1500 1
5 2015-09-16 xyz 200 2
6 2015-09-16 def 100 3
如何从上面转到:
CONTACT_DATE ISSUE_RANK_1 ISSUE_RANK_2 ISSUE_RANK_3
2015-09-15 abc (1000) def (750) xyz (100)
2015-09-16 abc (1500) xyz (200) def (100)
我相信我必须使用 reshape2
中的 melt
和 dcast
,但我还不知道如何使用。
默认情况下,dcast
使用输入 table 的最后一列作为输出 table 中的值。
library(reshape2)
d = read.table(text="id CONTACT_DATE ISSUE ISSUE_COUNT RANK
1 2015-09-15 abc 1000 1
2 2015-09-15 def 750 2
3 2015-09-15 xyz 100 3
4 2015-09-16 abc 1500 1
5 2015-09-16 xyz 200 2
6 2015-09-16 def 100 3", header=T)
d$x = paste(d$ISSUE, paste0("(",d$ISSUE_COUNT,")")) # create new column with values that will appear in table
dcast(CONTACT_DATE ~ RANK, data=d)
输出:
CONTACT_DATE 1 2 3
1 2015-09-15 abc (1000) def (750) xyz (100)
2 2015-09-16 abc (1500) xyz (200) def (100)
您可以使用 dplyr
和 tidyr
:
library(dplyr)
library(tidyr)
foo %>%
mutate(ISSUE_COUNT = paste0("(", ISSUE_COUNT, ")"),
RANK = paste0("ISSUE_RANK_", RANK)) %>%
unite(VAR, ISSUE, ISSUE_COUNT, sep = " ") %>%
spread(RANK, VAR)
给出:
# CONTACT_DATE ISSUE_RANK_1 ISSUE_RANK_2 ISSUE_RANK_3
#1 2015-09-15 abc (1000) def (750) xyz (100)
#2 2015-09-16 abc (1500) xyz (200) def (100)
假设我有以下 data.frame:
foo <- data.frame(CONTACT_DATE = c(rep(as.Date("2015-09-15"),3), rep(as.Date("2015-09-16"),3)), ISSUE = c("abc", "def", "xyz", "abc", "xyz", "def"), ISSUE_COUNT = c(1000,750,100,1500,200,100), RANK = c(1,2,3,1,2,3))
> foo
CONTACT_DATE ISSUE ISSUE_COUNT RANK
1 2015-09-15 abc 1000 1
2 2015-09-15 def 750 2
3 2015-09-15 xyz 100 3
4 2015-09-16 abc 1500 1
5 2015-09-16 xyz 200 2
6 2015-09-16 def 100 3
如何从上面转到:
CONTACT_DATE ISSUE_RANK_1 ISSUE_RANK_2 ISSUE_RANK_3
2015-09-15 abc (1000) def (750) xyz (100)
2015-09-16 abc (1500) xyz (200) def (100)
我相信我必须使用 reshape2
中的 melt
和 dcast
,但我还不知道如何使用。
默认情况下,dcast
使用输入 table 的最后一列作为输出 table 中的值。
library(reshape2)
d = read.table(text="id CONTACT_DATE ISSUE ISSUE_COUNT RANK
1 2015-09-15 abc 1000 1
2 2015-09-15 def 750 2
3 2015-09-15 xyz 100 3
4 2015-09-16 abc 1500 1
5 2015-09-16 xyz 200 2
6 2015-09-16 def 100 3", header=T)
d$x = paste(d$ISSUE, paste0("(",d$ISSUE_COUNT,")")) # create new column with values that will appear in table
dcast(CONTACT_DATE ~ RANK, data=d)
输出:
CONTACT_DATE 1 2 3
1 2015-09-15 abc (1000) def (750) xyz (100)
2 2015-09-16 abc (1500) xyz (200) def (100)
您可以使用 dplyr
和 tidyr
:
library(dplyr)
library(tidyr)
foo %>%
mutate(ISSUE_COUNT = paste0("(", ISSUE_COUNT, ")"),
RANK = paste0("ISSUE_RANK_", RANK)) %>%
unite(VAR, ISSUE, ISSUE_COUNT, sep = " ") %>%
spread(RANK, VAR)
给出:
# CONTACT_DATE ISSUE_RANK_1 ISSUE_RANK_2 ISSUE_RANK_3
#1 2015-09-15 abc (1000) def (750) xyz (100)
#2 2015-09-16 abc (1500) xyz (200) def (100)