如何在熔化过程中保留重复的行名?

How to keep duplicated rownames in melt process?

我正在尝试在 R 中绘制分组箱线图。数据如下:

mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769, 
    3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
    ), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858), 
        P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787, 
        3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category", 
    "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
    ), row.names = 1:2, class = "data.frame")


mydf
##   Category      P10       P2      P10       P2      P10 P2      P10       P2      P10 P2
## 1      RPP 3.352175 0.000000 3.273879 0.000000 3.182712  0 2.770654 2.635534 0.000000  0
## 2       RR 3.539194 3.090578 3.160005 3.159419 3.316039  0 3.293477 3.245297 3.924497  0

我想在其中生成如下数据框以在 ggplot2 中绘图:

然而,在使用 melt() 函数后:

melt(test, "Category")

我发现数据中只保留了前两列,这意味着后面的重复列被遗漏了,因为它们具有相同的列名。 还有其他方法吗?

如果您使用 "data.table" 中的 melt,您将不会遇到该问题:

library(data.table)
melt(as.data.table(mydf), "Category")
#     Category variable    value
#  1:      RPP      P10 3.352175
#  2:       RR      P10 3.539194
#  3:      RPP       P2 0.000000
#  4:       RR       P2 3.090578
#  5:      RPP      P10 3.273879
#  6:       RR      P10 3.160005
#  7:      RPP       P2 0.000000
#  8:       RR       P2 3.159419
#  9:      RPP      P10 3.182712
# 10:       RR      P10 3.316039
# 11:      RPP       P2 0.000000
# 12:       RR       P2 0.000000
# 13:      RPP      P10 2.770654
# 14:       RR      P10 3.293477
# 15:      RPP       P2 2.635534
# 16:       RR       P2 3.245297
# 17:      RPP      P10 0.000000
# 18:       RR      P10 3.924497
# 19:      RPP       P2 0.000000
# 20:       RR       P2 0.000000

基础 R 替代方案是使用 stack,如下所示:

cbind(Category = mydf[[1]], stack(mydf[-1]))
##    Category   values   ind
## 1       RPP 3.352175   P10
## 2        RR 3.539194   P10
## 3       RPP 0.000000    P2
## 4        RR 3.090578    P2
## 5       RPP 3.273879 P10.1
## 6        RR 3.160005 P10.1
## 7       RPP 0.000000  P2.1
## 8        RR 3.159419  P2.1
## 9       RPP 3.182712 P10.2
## 10       RR 3.316039 P10.2
## 11      RPP 0.000000  P2.2
## 12       RR 0.000000  P2.2
## 13      RPP 2.770654 P10.3
## 14       RR 3.293477 P10.3
## 15      RPP 2.635534  P2.3
## 16       RR 3.245297  P2.3
## 17      RPP 0.000000 P10.4
## 18       RR 3.924497 P10.4
## 19      RPP 0.000000  P2.4
## 20       RR 0.000000  P2.4

根据您打算如何使用数据,您可能还需要清理 "ind" 列。


示例数据:

mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769, 
    3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
    ), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858), 
        P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787, 
        3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category", 
    "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
    ), row.names = 1:2, class = "data.frame")

以防万一您也进行了一些转换并且需要在某个时候返回到初始表示,最好通过仍然拥有您需要的组来拥有此选项:

mydf %>% 
    setNames(nm = make.unique(names(.))) %>% 
    reshape2::melt("Category") %>% 
    transform(group = sub(x = variable, pattern = "\.\d+$", replacement = ""))

但是@A5C1D2H2I1M1N2O1R2T1 的建议当然要短得多,我必须牢记这一点...不知道 data.table 可以解决这个问题。