为什么在 R 中熔化 return NA 列?
Why does melt return NA column in R?
我在 R 中有以下列表 df
:
structure(list(disease = structure(c(1L, 1L), .Label = "Barcelona", class = "factor"),
`<18` = structure(list(0.193103448275862,
0.0445344129554656), .Names = c(NA_character_, NA_character_
)), `19-25` = structure(list(0.0413793103448276,
0.345748987854251), .Names = c(NA_character_, NA_character_
)), `26-64` = structure(list(0.448275862068966, 0.167611336032389), .Names = c(NA_character_,
NA_character_)), `46-64` = structure(list(0.0344827586206897,
0.00647773279352227), .Names = c(NA_character_, NA_character_
)), `>65` = structure(list(0.282758620689655,
0.435627530364373), .Names = c(NA_character_, NA_character_
)), type = structure(1:2, .Label = c("Clinical Trial", "Real-World"
), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我想重新排列数据框,以便我可以使用 melt
按城市、公寓和年龄组获取每个值。但是,我得到一个额外的列作为输出:
melt(df)
city type variable value NA
1 Barcelona flat <18 0.19310345 0.044534413
2 Barcelona house <18 0.19310345 0.044534413
3 Barcelona flat 19 - 25 0.04137931 0.345748988
4 Barcelona house 19 - 25 0.04137931 0.345748988
5 Barcelona flat 26 - 45 0.44827586 0.167611336
6 Barcelona house 26 - 45 0.44827586 0.167611336
7 Barcelona flat 46 - 64 0.03448276 0.006477733
8 Barcelona house 46 - 64 0.03448276 0.006477733
9 Barcelona flat > 65 0.28275862 0.435627530
10 Barcelona house > 65 0.28275862 0.435627530
有什么方法可以不使用 NA
列并在 value
列中获取唯一值?
问题是您的度量列是 list
class,而不是 numeric
class。如果我们将它们转换为数字,melt
将正常工作。 (我展示了一种方法,但最好在你的工作流中更早地进行,并首先防止将列创建为列表......如果我的代码适用于你的,这绝对是你应该做的示例数据在较大数据上遇到问题。tidyr::unnest
在这种情况下可能会有所帮助。)
sapply(df, class)
# disease <18 19-25 26-64 46-64 >65 type
# "factor" "list" "list" "list" "list" "list" "factor"
list_cols = sapply(df, is.list)
df[list_cols] = lapply(df[list_cols], unlist)
reshape2::melt(df, id.vars = c("disease", "type"))
# disease type variable value
# 1 Barcelona Clinical Trial <18 0.193103448
# 2 Barcelona Real-World <18 0.044534413
# 3 Barcelona Clinical Trial 19-25 0.041379310
# 4 Barcelona Real-World 19-25 0.345748988
# 5 Barcelona Clinical Trial 26-64 0.448275862
# 6 Barcelona Real-World 26-64 0.167611336
# 7 Barcelona Clinical Trial 46-64 0.034482759
# 8 Barcelona Real-World 46-64 0.006477733
# 9 Barcelona Clinical Trial >65 0.282758621
# 10 Barcelona Real-World >65 0.435627530
我在 R 中有以下列表 df
:
structure(list(disease = structure(c(1L, 1L), .Label = "Barcelona", class = "factor"),
`<18` = structure(list(0.193103448275862,
0.0445344129554656), .Names = c(NA_character_, NA_character_
)), `19-25` = structure(list(0.0413793103448276,
0.345748987854251), .Names = c(NA_character_, NA_character_
)), `26-64` = structure(list(0.448275862068966, 0.167611336032389), .Names = c(NA_character_,
NA_character_)), `46-64` = structure(list(0.0344827586206897,
0.00647773279352227), .Names = c(NA_character_, NA_character_
)), `>65` = structure(list(0.282758620689655,
0.435627530364373), .Names = c(NA_character_, NA_character_
)), type = structure(1:2, .Label = c("Clinical Trial", "Real-World"
), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我想重新排列数据框,以便我可以使用 melt
按城市、公寓和年龄组获取每个值。但是,我得到一个额外的列作为输出:
melt(df)
city type variable value NA
1 Barcelona flat <18 0.19310345 0.044534413
2 Barcelona house <18 0.19310345 0.044534413
3 Barcelona flat 19 - 25 0.04137931 0.345748988
4 Barcelona house 19 - 25 0.04137931 0.345748988
5 Barcelona flat 26 - 45 0.44827586 0.167611336
6 Barcelona house 26 - 45 0.44827586 0.167611336
7 Barcelona flat 46 - 64 0.03448276 0.006477733
8 Barcelona house 46 - 64 0.03448276 0.006477733
9 Barcelona flat > 65 0.28275862 0.435627530
10 Barcelona house > 65 0.28275862 0.435627530
有什么方法可以不使用 NA
列并在 value
列中获取唯一值?
问题是您的度量列是 list
class,而不是 numeric
class。如果我们将它们转换为数字,melt
将正常工作。 (我展示了一种方法,但最好在你的工作流中更早地进行,并首先防止将列创建为列表......如果我的代码适用于你的,这绝对是你应该做的示例数据在较大数据上遇到问题。tidyr::unnest
在这种情况下可能会有所帮助。)
sapply(df, class)
# disease <18 19-25 26-64 46-64 >65 type
# "factor" "list" "list" "list" "list" "list" "factor"
list_cols = sapply(df, is.list)
df[list_cols] = lapply(df[list_cols], unlist)
reshape2::melt(df, id.vars = c("disease", "type"))
# disease type variable value
# 1 Barcelona Clinical Trial <18 0.193103448
# 2 Barcelona Real-World <18 0.044534413
# 3 Barcelona Clinical Trial 19-25 0.041379310
# 4 Barcelona Real-World 19-25 0.345748988
# 5 Barcelona Clinical Trial 26-64 0.448275862
# 6 Barcelona Real-World 26-64 0.167611336
# 7 Barcelona Clinical Trial 46-64 0.034482759
# 8 Barcelona Real-World 46-64 0.006477733
# 9 Barcelona Clinical Trial >65 0.282758621
# 10 Barcelona Real-World >65 0.435627530