在 R 中重塑数据 - 根据现有列中的值创建新列
Reshaping Data in R - Creating new columns based on values in an existing column
所以我正在研究 R 中的一个问题,我有一个数据框,其中有一列包含一系列变量名称:
*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar
我想实现的是:
*Name* *id_key* *detail* *foo* *bar* *choo*
Jose 123 red abc NA NA
Jose 123 blue abc NA NA
Jose 123 green abc NA NA
Mel 456 red NA 555 NA
Mel 456 green NA 555 NA
Dom 789 yellow NA NA fjfj55bar
我尝试通过以下命令使用 reshape2 包中的 dcast - 但它没有产生预期的结果:
toy_data_unmelt <- dcast(toy_data, formula = name~var_names, value.var = "values")
如有任何帮助,我们将不胜感激!
reshape2
已替换为 tidyr
。 (reshape2
仍然可用,但我会进行切换以使您的代码保持最新状态。)这是 tidyr
解决方案:
library(tidyr)
toy_data <- read_table("*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar")
toy_data_wide <- spread(toy_data, `*var_names*`, `*values*`)
或者,使用管道运算符
toy_data_wide <- toy_data %>%
spread(`*var_names*`, `*values*`)
为此,您需要使用 tidyr
包中的 spread
函数:
library(tidyr)
toy_data = data.frame(Name = c("Jose", "Jose", "Jose", "Mel", "Mel", "Dom"),
id_key = c(123, 123, 123, 456, 456, 789),
detail = c("red", "blue", "green", "red", "green", "yellow"),
var_names = c("foo", "foo", "foo", "bar", "bar", "choo"),
values = c("abc", "abc", "abc", "555", "555", "fjfj55bar"))
toy_data %>% spread(var_names, values, fill = NA)
输出:
# Name id_key detail bar choo foo
#1 Dom 789 yellow <NA> fjfj55bar <NA>
#2 Jose 123 blue <NA> <NA> abc
#3 Jose 123 green <NA> <NA> abc
#4 Jose 123 red <NA> <NA> abc
#5 Mel 456 green 555 <NA> <NA>
#6 Mel 456 red 555 <NA> <NA>
所以我正在研究 R 中的一个问题,我有一个数据框,其中有一列包含一系列变量名称:
*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar
我想实现的是:
*Name* *id_key* *detail* *foo* *bar* *choo*
Jose 123 red abc NA NA
Jose 123 blue abc NA NA
Jose 123 green abc NA NA
Mel 456 red NA 555 NA
Mel 456 green NA 555 NA
Dom 789 yellow NA NA fjfj55bar
我尝试通过以下命令使用 reshape2 包中的 dcast - 但它没有产生预期的结果:
toy_data_unmelt <- dcast(toy_data, formula = name~var_names, value.var = "values")
如有任何帮助,我们将不胜感激!
reshape2
已替换为 tidyr
。 (reshape2
仍然可用,但我会进行切换以使您的代码保持最新状态。)这是 tidyr
解决方案:
library(tidyr)
toy_data <- read_table("*Name* *id_key* *detail* *var_names* *values*
Jose 123 red foo abc
Jose 123 blue foo abc
Jose 123 green foo abc
Mel 456 red bar 555
Mel 456 green bar 555
Dom 789 yellow choo fjfj55bar")
toy_data_wide <- spread(toy_data, `*var_names*`, `*values*`)
或者,使用管道运算符
toy_data_wide <- toy_data %>%
spread(`*var_names*`, `*values*`)
为此,您需要使用 tidyr
包中的 spread
函数:
library(tidyr)
toy_data = data.frame(Name = c("Jose", "Jose", "Jose", "Mel", "Mel", "Dom"),
id_key = c(123, 123, 123, 456, 456, 789),
detail = c("red", "blue", "green", "red", "green", "yellow"),
var_names = c("foo", "foo", "foo", "bar", "bar", "choo"),
values = c("abc", "abc", "abc", "555", "555", "fjfj55bar"))
toy_data %>% spread(var_names, values, fill = NA)
输出:
# Name id_key detail bar choo foo
#1 Dom 789 yellow <NA> fjfj55bar <NA>
#2 Jose 123 blue <NA> <NA> abc
#3 Jose 123 green <NA> <NA> abc
#4 Jose 123 red <NA> <NA> abc
#5 Mel 456 green 555 <NA> <NA>
#6 Mel 456 red 555 <NA> <NA>