如何在熔化过程中保留重复的行名?
How to keep duplicated rownames in melt process?
我正在尝试在 R 中绘制分组箱线图。数据如下:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
mydf
## Category P10 P2 P10 P2 P10 P2 P10 P2 P10 P2
## 1 RPP 3.352175 0.000000 3.273879 0.000000 3.182712 0 2.770654 2.635534 0.000000 0
## 2 RR 3.539194 3.090578 3.160005 3.159419 3.316039 0 3.293477 3.245297 3.924497 0
我想在其中生成如下数据框以在 ggplot2 中绘图:
- 类别变量值
- RPP P10 3.35
- RPP P2 0
- RR P10 3.54
- ...
然而,在使用 melt() 函数后:
melt(test, "Category")
我发现数据中只保留了前两列,这意味着后面的重复列被遗漏了,因为它们具有相同的列名。
还有其他方法吗?
如果您使用 "data.table" 中的 melt
,您将不会遇到该问题:
library(data.table)
melt(as.data.table(mydf), "Category")
# Category variable value
# 1: RPP P10 3.352175
# 2: RR P10 3.539194
# 3: RPP P2 0.000000
# 4: RR P2 3.090578
# 5: RPP P10 3.273879
# 6: RR P10 3.160005
# 7: RPP P2 0.000000
# 8: RR P2 3.159419
# 9: RPP P10 3.182712
# 10: RR P10 3.316039
# 11: RPP P2 0.000000
# 12: RR P2 0.000000
# 13: RPP P10 2.770654
# 14: RR P10 3.293477
# 15: RPP P2 2.635534
# 16: RR P2 3.245297
# 17: RPP P10 0.000000
# 18: RR P10 3.924497
# 19: RPP P2 0.000000
# 20: RR P2 0.000000
基础 R 替代方案是使用 stack
,如下所示:
cbind(Category = mydf[[1]], stack(mydf[-1]))
## Category values ind
## 1 RPP 3.352175 P10
## 2 RR 3.539194 P10
## 3 RPP 0.000000 P2
## 4 RR 3.090578 P2
## 5 RPP 3.273879 P10.1
## 6 RR 3.160005 P10.1
## 7 RPP 0.000000 P2.1
## 8 RR 3.159419 P2.1
## 9 RPP 3.182712 P10.2
## 10 RR 3.316039 P10.2
## 11 RPP 0.000000 P2.2
## 12 RR 0.000000 P2.2
## 13 RPP 2.770654 P10.3
## 14 RR 3.293477 P10.3
## 15 RPP 2.635534 P2.3
## 16 RR 3.245297 P2.3
## 17 RPP 0.000000 P10.4
## 18 RR 3.924497 P10.4
## 19 RPP 0.000000 P2.4
## 20 RR 0.000000 P2.4
根据您打算如何使用数据,您可能还需要清理 "ind" 列。
示例数据:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
以防万一您也进行了一些转换并且需要在某个时候返回到初始表示,最好通过仍然拥有您需要的组来拥有此选项:
mydf %>%
setNames(nm = make.unique(names(.))) %>%
reshape2::melt("Category") %>%
transform(group = sub(x = variable, pattern = "\.\d+$", replacement = ""))
但是@A5C1D2H2I1M1N2O1R2T1 的建议当然要短得多,我必须牢记这一点...不知道 data.table
可以解决这个问题。
我正在尝试在 R 中绘制分组箱线图。数据如下:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
mydf
## Category P10 P2 P10 P2 P10 P2 P10 P2 P10 P2
## 1 RPP 3.352175 0.000000 3.273879 0.000000 3.182712 0 2.770654 2.635534 0.000000 0
## 2 RR 3.539194 3.090578 3.160005 3.159419 3.316039 0 3.293477 3.245297 3.924497 0
我想在其中生成如下数据框以在 ggplot2 中绘图:
- 类别变量值
- RPP P10 3.35
- RPP P2 0
- RR P10 3.54
- ...
然而,在使用 melt() 函数后:
melt(test, "Category")
我发现数据中只保留了前两列,这意味着后面的重复列被遗漏了,因为它们具有相同的列名。 还有其他方法吗?
如果您使用 "data.table" 中的 melt
,您将不会遇到该问题:
library(data.table)
melt(as.data.table(mydf), "Category")
# Category variable value
# 1: RPP P10 3.352175
# 2: RR P10 3.539194
# 3: RPP P2 0.000000
# 4: RR P2 3.090578
# 5: RPP P10 3.273879
# 6: RR P10 3.160005
# 7: RPP P2 0.000000
# 8: RR P2 3.159419
# 9: RPP P10 3.182712
# 10: RR P10 3.316039
# 11: RPP P2 0.000000
# 12: RR P2 0.000000
# 13: RPP P10 2.770654
# 14: RR P10 3.293477
# 15: RPP P2 2.635534
# 16: RR P2 3.245297
# 17: RPP P10 0.000000
# 18: RR P10 3.924497
# 19: RPP P2 0.000000
# 20: RR P2 0.000000
基础 R 替代方案是使用 stack
,如下所示:
cbind(Category = mydf[[1]], stack(mydf[-1]))
## Category values ind
## 1 RPP 3.352175 P10
## 2 RR 3.539194 P10
## 3 RPP 0.000000 P2
## 4 RR 3.090578 P2
## 5 RPP 3.273879 P10.1
## 6 RR 3.160005 P10.1
## 7 RPP 0.000000 P2.1
## 8 RR 3.159419 P2.1
## 9 RPP 3.182712 P10.2
## 10 RR 3.316039 P10.2
## 11 RPP 0.000000 P2.2
## 12 RR 0.000000 P2.2
## 13 RPP 2.770654 P10.3
## 14 RR 3.293477 P10.3
## 15 RPP 2.635534 P2.3
## 16 RR 3.245297 P2.3
## 17 RPP 0.000000 P10.4
## 18 RR 3.924497 P10.4
## 19 RPP 0.000000 P2.4
## 20 RR 0.000000 P2.4
根据您打算如何使用数据,您可能还需要清理 "ind" 列。
示例数据:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
以防万一您也进行了一些转换并且需要在某个时候返回到初始表示,最好通过仍然拥有您需要的组来拥有此选项:
mydf %>%
setNames(nm = make.unique(names(.))) %>%
reshape2::melt("Category") %>%
transform(group = sub(x = variable, pattern = "\.\d+$", replacement = ""))
但是@A5C1D2H2I1M1N2O1R2T1 的建议当然要短得多,我必须牢记这一点...不知道 data.table
可以解决这个问题。