将长数据格式转换为宽格式
cast long data format to wide format
我需要根据以下条件(如果可能)将长数据格式 (long) 转换为宽格式 (wide):
1) 所有数据文件都是长格式(long),结构相同(id, name, value),但每个数据文件会有不同的变量、值和变量个数:
id = case
name = variable
value = variable value(s)
2) 每个数据文件都是不同的变量组合(因子、整数、数字)。有些因素每个案例可能有多个水平(水果和肉类),我想为这些因素中的每个水平创建一个单独的虚拟变量(逻辑)。因子和数值变量的数量将因数据文件而异。
3) 鉴于每个数据文件的变量都不同,我希望将它自动化,我可以在不更改任何变量名称的情况下将相同的代码应用于每个数据文件。
我已经尝试过 reshape2 和 tidyr,但找不到完成它的方法。
这是长格式:
long
id name value
1 1 fruit apple
2 1 fruit banana
3 1 fruit orange
4 1 fruit pineapple
5 1 meat steak
6 1 meat chicken
7 1 fname dave
8 1 wt 185
9 1 status active
10 2 fruit apple
11 2 fruit pineapple
12 2 meat chicken
13 2 fname jeff
14 2 wt 205
15 2 status active
16 3 fruit apple
17 3 fruit banana
18 3 meat steak
19 3 fname jane
20 3 wt 125
21 3 status lapsed
这是我更喜欢的宽幅格式:
wide
id fruit.apple fruit.banana fruit.orange fruit.pineapple meat.steak meat.chicken fname wt status
1 1 TRUE TRUE TRUE TRUE TRUE TRUE dave 185 active
2 2 TRUE FALSE FALSE TRUE FALSE TRUE jeff 205 active
3 3 TRUE TRUE FALSE FALSE TRUE FALSE jane 125 lapsed
长格式数据:
long <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), name = c("fruit",
"fruit", "fruit", "fruit", "meat", "meat", "fname", "wt", "status",
"fruit", "fruit", "meat", "fname", "wt", "status", "fruit", "fruit",
"meat", "fname", "wt", "status"), value = c("apple", "banana",
"orange", "pineapple", "steak", "chicken", "dave", "185", "active",
"apple", "pineapple", "chicken", "jeff", "205", "active", "apple",
"banana", "steak", "jane", "125", "lapsed")), .Names = c("id",
"name", "value"), class = "data.frame", row.names = c(NA, -21L
))
解决方案使用 dplyr
和 tidyr
。
library(dplyr)
library(tidyr)
wide <- long %>%
mutate(value2 = ifelse(name %in% c("fruit", "meat"), "1", value),
name2 = ifelse(name %in% c("fruit", "meat"),
paste(name, value, sep = "."), name)) %>%
select(-name, -value) %>%
spread(name2, value2, fill = "0") %>%
mutate_at(vars(matches("fruit|meat")), as.numeric) %>%
mutate_at(vars(matches("fruit|meat")), as.logical)
我需要根据以下条件(如果可能)将长数据格式 (long) 转换为宽格式 (wide):
1) 所有数据文件都是长格式(long),结构相同(id, name, value),但每个数据文件会有不同的变量、值和变量个数:
id = case
name = variable
value = variable value(s)
2) 每个数据文件都是不同的变量组合(因子、整数、数字)。有些因素每个案例可能有多个水平(水果和肉类),我想为这些因素中的每个水平创建一个单独的虚拟变量(逻辑)。因子和数值变量的数量将因数据文件而异。
3) 鉴于每个数据文件的变量都不同,我希望将它自动化,我可以在不更改任何变量名称的情况下将相同的代码应用于每个数据文件。
我已经尝试过 reshape2 和 tidyr,但找不到完成它的方法。
这是长格式:
long
id name value
1 1 fruit apple
2 1 fruit banana
3 1 fruit orange
4 1 fruit pineapple
5 1 meat steak
6 1 meat chicken
7 1 fname dave
8 1 wt 185
9 1 status active
10 2 fruit apple
11 2 fruit pineapple
12 2 meat chicken
13 2 fname jeff
14 2 wt 205
15 2 status active
16 3 fruit apple
17 3 fruit banana
18 3 meat steak
19 3 fname jane
20 3 wt 125
21 3 status lapsed
这是我更喜欢的宽幅格式:
wide
id fruit.apple fruit.banana fruit.orange fruit.pineapple meat.steak meat.chicken fname wt status
1 1 TRUE TRUE TRUE TRUE TRUE TRUE dave 185 active
2 2 TRUE FALSE FALSE TRUE FALSE TRUE jeff 205 active
3 3 TRUE TRUE FALSE FALSE TRUE FALSE jane 125 lapsed
长格式数据:
long <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), name = c("fruit",
"fruit", "fruit", "fruit", "meat", "meat", "fname", "wt", "status",
"fruit", "fruit", "meat", "fname", "wt", "status", "fruit", "fruit",
"meat", "fname", "wt", "status"), value = c("apple", "banana",
"orange", "pineapple", "steak", "chicken", "dave", "185", "active",
"apple", "pineapple", "chicken", "jeff", "205", "active", "apple",
"banana", "steak", "jane", "125", "lapsed")), .Names = c("id",
"name", "value"), class = "data.frame", row.names = c(NA, -21L
))
解决方案使用 dplyr
和 tidyr
。
library(dplyr)
library(tidyr)
wide <- long %>%
mutate(value2 = ifelse(name %in% c("fruit", "meat"), "1", value),
name2 = ifelse(name %in% c("fruit", "meat"),
paste(name, value, sep = "."), name)) %>%
select(-name, -value) %>%
spread(name2, value2, fill = "0") %>%
mutate_at(vars(matches("fruit|meat")), as.numeric) %>%
mutate_at(vars(matches("fruit|meat")), as.logical)