R:熔化和铸造
R: Melt and Dcast
我有这样的数据集:
CASE_ID = c("C1","C1", "C2","C2", "C2", "C3", "C4")
PERSON_ID = c(1,0,7,8,1,20,7)
PERSON_DIVISION = c("Zone 1", "NA", "Zone 1", "Zone 3", "Zone 1", "Zone 5", "Zone 1")
df <- data.frame(CASE_ID, PERSON_ID, PERSON_DIVISION)
df
这导致:
CASE_ID PERSON_ID PERSON_DIVISION
1 C1 1 Zone 1
2 C1 0 NA
3 C2 7 Zone 1
4 C2 8 Zone 3
5 C2 1 Zone 1
6 C3 20 Zone 5
7 C4 7 Zone 1
我想将其转换为:
CASE_ID P1_ID P2_ID P3_ID P1_Division P2_Division P3_Division
1 1 0 NA Zone 1 NA NA
2 7 8 1 Zone 1 Zone 3 Zone 1
3 20 NA NA Zone 5 NA NA
4 7 NA NA Zone 1 NA NA
到目前为止,我的方法是融化数据和后来的 Dcast:
e <- melt(df)
dcast(e, CASE_ID ~ PERSON_DIVISION + variable)
但我没有得到想要的输出,而是得到:
CASE_ID NA_PERSON_ID Zone 1_PERSON_ID Zone 3_PERSON_ID Zone 5_PERSON_ID
1 C1 1 1 0 0
2 C2 0 2 1 0
3 C3 0 0 0 1
4 C4 0 1 0 0
这里有两个问题:
- 您的数据已经是长格式,但您有 两个 值列。
data.table
的最新版本支持 dcast()
. 中的多个值变量
- 您需要在每个组中使用唯一的行 ID。否则,
dcast()
将尝试聚合重复项(默认情况下使用 length()
,这解释了您得到的输出)。
请试试
library(data.table) # version 1.10.4 used here
# coerce to data.table, add unique row numbers for each group
setDT(df)[, rn := rowid(CASE_ID)]
# dcast with multiple value vars
dcast(df, CASE_ID ~ rn, value.var = list("PERSON_ID", "PERSON_DIVISION"))
# CASE_ID PERSON_ID_1 PERSON_ID_2 PERSON_ID_3 PERSON_DIVISION_1 PERSON_DIVISION_2 PERSON_DIVISION_3
#1: C1 1 0 NA Zone 1 NA NA
#2: C2 7 8 1 Zone 1 Zone 3 Zone 1
#3: C3 20 NA NA Zone 5 NA NA
#4: C4 7 NA NA Zone 1 NA NA
这可以更简洁地写成一行:
dcast(setDT(df), CASE_ID ~ rowid(CASE_ID), value.var = list("PERSON_ID", "PERSON_DIVISION"))
我有这样的数据集:
CASE_ID = c("C1","C1", "C2","C2", "C2", "C3", "C4")
PERSON_ID = c(1,0,7,8,1,20,7)
PERSON_DIVISION = c("Zone 1", "NA", "Zone 1", "Zone 3", "Zone 1", "Zone 5", "Zone 1")
df <- data.frame(CASE_ID, PERSON_ID, PERSON_DIVISION)
df
这导致:
CASE_ID PERSON_ID PERSON_DIVISION
1 C1 1 Zone 1
2 C1 0 NA
3 C2 7 Zone 1
4 C2 8 Zone 3
5 C2 1 Zone 1
6 C3 20 Zone 5
7 C4 7 Zone 1
我想将其转换为:
CASE_ID P1_ID P2_ID P3_ID P1_Division P2_Division P3_Division
1 1 0 NA Zone 1 NA NA
2 7 8 1 Zone 1 Zone 3 Zone 1
3 20 NA NA Zone 5 NA NA
4 7 NA NA Zone 1 NA NA
到目前为止,我的方法是融化数据和后来的 Dcast:
e <- melt(df)
dcast(e, CASE_ID ~ PERSON_DIVISION + variable)
但我没有得到想要的输出,而是得到:
CASE_ID NA_PERSON_ID Zone 1_PERSON_ID Zone 3_PERSON_ID Zone 5_PERSON_ID
1 C1 1 1 0 0
2 C2 0 2 1 0
3 C3 0 0 0 1
4 C4 0 1 0 0
这里有两个问题:
- 您的数据已经是长格式,但您有 两个 值列。
data.table
的最新版本支持dcast()
. 中的多个值变量
- 您需要在每个组中使用唯一的行 ID。否则,
dcast()
将尝试聚合重复项(默认情况下使用length()
,这解释了您得到的输出)。
请试试
library(data.table) # version 1.10.4 used here
# coerce to data.table, add unique row numbers for each group
setDT(df)[, rn := rowid(CASE_ID)]
# dcast with multiple value vars
dcast(df, CASE_ID ~ rn, value.var = list("PERSON_ID", "PERSON_DIVISION"))
# CASE_ID PERSON_ID_1 PERSON_ID_2 PERSON_ID_3 PERSON_DIVISION_1 PERSON_DIVISION_2 PERSON_DIVISION_3
#1: C1 1 0 NA Zone 1 NA NA
#2: C2 7 8 1 Zone 1 Zone 3 Zone 1
#3: C3 20 NA NA Zone 5 NA NA
#4: C4 7 NA NA Zone 1 NA NA
这可以更简洁地写成一行:
dcast(setDT(df), CASE_ID ~ rowid(CASE_ID), value.var = list("PERSON_ID", "PERSON_DIVISION"))