熔化数据,保持某些列成对
Melting data, keeping certain columns paired
我有如下数据:
DT <- structure(list(ECOST = c("Choice_01", "Choice_02", "Choice_03",
"Choice_04", "Choice_05", "Choice_06", "Choice_07", "Choice_08",
"Choice_09", "Choice_10", "Choice_11", "Choice_12"), control = c(18,
30, 47, 66, 86, 35, 31, 46, 55, 39, 55, 41), treatment = c(31,
35, 46, 68, 86, 36, 32, 42, 52, 39, 58, 43), control_p = c(0.163636363636364,
0.272727272727273, 0.427272727272727, 0.6, 0.781818181818182,
0.318181818181818, 0.281818181818182, 0.418181818181818, 0.5,
0.354545454545455, 0.5, 0.372727272727273), treatment_p = c(0.319587628865979,
0.360824742268041, 0.474226804123711, 0.701030927835051, 0.88659793814433,
0.371134020618557, 0.329896907216495, 0.43298969072165, 0.536082474226804,
0.402061855670103, 0.597938144329897, 0.443298969072165)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 12 x 5
ECOST control treatment control_p treatment_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 Choice_01 18 31 0.164 0.320
2 Choice_02 30 35 0.273 0.361
3 Choice_03 47 46 0.427 0.474
4 Choice_04 66 68 0.6 0.701
5 Choice_05 86 86 0.782 0.887
6 Choice_06 35 36 0.318 0.371
7 Choice_07 31 32 0.282 0.330
8 Choice_08 46 42 0.418 0.433
9 Choice_09 55 52 0.5 0.536
10 Choice_10 39 39 0.355 0.402
11 Choice_11 55 58 0.5 0.598
12 Choice_12 41 43 0.373 0.443
我想融化这些数据,但我希望列 control
和 control_p
保持在一起,列 treatment
和 treatment_p
保持在一起,创建具有 24 行和 4 列的 table。
想要的结果:
# A tibble: 12 x 5
ECOST count percentage group
<chr> <dbl> <dbl>
1 Choice_01 18 0.164 control
2 Choice_02 30 0.273 control
3 Choice_03 47 0.427 control
4 Choice_04 66 0.6 control
5 Choice_05 86 0.782 control
6 Choice_06 35 0.318 control
7 Choice_07 31 0.282 control
8 Choice_08 46 0.418 control
9 Choice_09 55 0.5 control
10 Choice_10 39 0.355 control
11 Choice_11 55 0.5 control
12 Choice_12 41 0.373 control
13 Choice_01 18 0.320 treatment
14 Choice_02 30 0.361 treatment
15 Choice_03 46 0.474 treatment
16 Choice_04 68 0.701 treatment
17 Choice_05 86 0.887 treatment
18 Choice_06 36 0.371 treatment
19 Choice_07 32 0.330 treatment
20 Choice_08 42 0.433 treatment
21 Choice_09 52 0.536 treatment
22 Choice_10 39 0.402 treatment
23 Choice_11 58 0.598 treatment
24 Choice_12 43 0.443 treatment
使用 pivot_longer
,进行一些数据整理,然后 pivot_wider
您可以像这样实现您想要的结果:
library(tidyr)
library(dplyr)
DT %>%
pivot_longer(-ECOST) %>%
separate(name, into = c("group", "what")) %>%
mutate(what = ifelse(is.na(what), "count", "percentage")) %>%
pivot_wider(names_from = "what", values_from = "value")
#> # A tibble: 24 x 4
#> ECOST group count percentage
#> <chr> <chr> <dbl> <dbl>
#> 1 Choice_01 control 18 0.164
#> 2 Choice_01 treatment 31 0.320
#> 3 Choice_02 control 30 0.273
#> 4 Choice_02 treatment 35 0.361
#> 5 Choice_03 control 47 0.427
#> 6 Choice_03 treatment 46 0.474
#> 7 Choice_04 control 66 0.6
#> 8 Choice_04 treatment 68 0.701
#> 9 Choice_05 control 86 0.782
#> 10 Choice_05 treatment 86 0.887
#> # … with 14 more rows
由 reprex package (v1.0.0)
于 2021-02-21 创建
您可以重命名列,以便清楚区分 count
和 percentage
列,然后使用 pivot_longer
library(dplyr)
library(tidyr)
DT %>%
rename_with(~paste(sub('_.*', '', .),
rep(c('count', 'percentage'), each = 2), sep = '_'), -1) %>%
pivot_longer(cols = -ECOST,
names_to = c('group', '.value'),
names_sep = '_')
# A tibble: 24 x 4
# ECOST group count percentage
# <chr> <chr> <dbl> <dbl>
# 1 Choice_01 control 18 0.164
# 2 Choice_01 treatment 31 0.320
# 3 Choice_02 control 30 0.273
# 4 Choice_02 treatment 35 0.361
# 5 Choice_03 control 47 0.427
# 6 Choice_03 treatment 46 0.474
# 7 Choice_04 control 66 0.6
# 8 Choice_04 treatment 68 0.701
# 9 Choice_05 control 86 0.782
#10 Choice_05 treatment 86 0.887
# … with 14 more rows
这是一个 data.table
方法,其中 workaround 用于 melt.data.table()
的 limitation/feature
library( data.table )
setDT(DT)
#get suffixes
suffix <- unique( sub( "(^.*)(_[a-z])", "\1", names( DT[ , -1] ) ) )
#melt
DT2 <- melt( DT, id.vars = "ECOST", measure.vars = patterns( count = "[a-oq-z]$", percentage = "_p$"))
#replace factor-levels with the colnames
setattr(DT2$variable, "levels", suffix )
ECOST variable count percentage
1: Choice_01 control 18 0.1636364
2: Choice_02 control 30 0.2727273
3: Choice_03 control 47 0.4272727
4: Choice_04 control 66 0.6000000
5: Choice_05 control 86 0.7818182
6: Choice_06 control 35 0.3181818
7: Choice_07 control 31 0.2818182
8: Choice_08 control 46 0.4181818
9: Choice_09 control 55 0.5000000
10: Choice_10 control 39 0.3545455
11: Choice_11 control 55 0.5000000
12: Choice_12 control 41 0.3727273
13: Choice_01 treatment 31 0.3195876
14: Choice_02 treatment 35 0.3608247
15: Choice_03 treatment 46 0.4742268
16: Choice_04 treatment 68 0.7010309
17: Choice_05 treatment 86 0.8865979
18: Choice_06 treatment 36 0.3711340
19: Choice_07 treatment 32 0.3298969
20: Choice_08 treatment 42 0.4329897
21: Choice_09 treatment 52 0.5360825
22: Choice_10 treatment 39 0.4020619
23: Choice_11 treatment 58 0.5979381
24: Choice_12 treatment 43 0.4432990
ECOST variable count percentage
我有如下数据:
DT <- structure(list(ECOST = c("Choice_01", "Choice_02", "Choice_03",
"Choice_04", "Choice_05", "Choice_06", "Choice_07", "Choice_08",
"Choice_09", "Choice_10", "Choice_11", "Choice_12"), control = c(18,
30, 47, 66, 86, 35, 31, 46, 55, 39, 55, 41), treatment = c(31,
35, 46, 68, 86, 36, 32, 42, 52, 39, 58, 43), control_p = c(0.163636363636364,
0.272727272727273, 0.427272727272727, 0.6, 0.781818181818182,
0.318181818181818, 0.281818181818182, 0.418181818181818, 0.5,
0.354545454545455, 0.5, 0.372727272727273), treatment_p = c(0.319587628865979,
0.360824742268041, 0.474226804123711, 0.701030927835051, 0.88659793814433,
0.371134020618557, 0.329896907216495, 0.43298969072165, 0.536082474226804,
0.402061855670103, 0.597938144329897, 0.443298969072165)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 12 x 5
ECOST control treatment control_p treatment_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 Choice_01 18 31 0.164 0.320
2 Choice_02 30 35 0.273 0.361
3 Choice_03 47 46 0.427 0.474
4 Choice_04 66 68 0.6 0.701
5 Choice_05 86 86 0.782 0.887
6 Choice_06 35 36 0.318 0.371
7 Choice_07 31 32 0.282 0.330
8 Choice_08 46 42 0.418 0.433
9 Choice_09 55 52 0.5 0.536
10 Choice_10 39 39 0.355 0.402
11 Choice_11 55 58 0.5 0.598
12 Choice_12 41 43 0.373 0.443
我想融化这些数据,但我希望列 control
和 control_p
保持在一起,列 treatment
和 treatment_p
保持在一起,创建具有 24 行和 4 列的 table。
想要的结果:
# A tibble: 12 x 5
ECOST count percentage group
<chr> <dbl> <dbl>
1 Choice_01 18 0.164 control
2 Choice_02 30 0.273 control
3 Choice_03 47 0.427 control
4 Choice_04 66 0.6 control
5 Choice_05 86 0.782 control
6 Choice_06 35 0.318 control
7 Choice_07 31 0.282 control
8 Choice_08 46 0.418 control
9 Choice_09 55 0.5 control
10 Choice_10 39 0.355 control
11 Choice_11 55 0.5 control
12 Choice_12 41 0.373 control
13 Choice_01 18 0.320 treatment
14 Choice_02 30 0.361 treatment
15 Choice_03 46 0.474 treatment
16 Choice_04 68 0.701 treatment
17 Choice_05 86 0.887 treatment
18 Choice_06 36 0.371 treatment
19 Choice_07 32 0.330 treatment
20 Choice_08 42 0.433 treatment
21 Choice_09 52 0.536 treatment
22 Choice_10 39 0.402 treatment
23 Choice_11 58 0.598 treatment
24 Choice_12 43 0.443 treatment
使用 pivot_longer
,进行一些数据整理,然后 pivot_wider
您可以像这样实现您想要的结果:
library(tidyr)
library(dplyr)
DT %>%
pivot_longer(-ECOST) %>%
separate(name, into = c("group", "what")) %>%
mutate(what = ifelse(is.na(what), "count", "percentage")) %>%
pivot_wider(names_from = "what", values_from = "value")
#> # A tibble: 24 x 4
#> ECOST group count percentage
#> <chr> <chr> <dbl> <dbl>
#> 1 Choice_01 control 18 0.164
#> 2 Choice_01 treatment 31 0.320
#> 3 Choice_02 control 30 0.273
#> 4 Choice_02 treatment 35 0.361
#> 5 Choice_03 control 47 0.427
#> 6 Choice_03 treatment 46 0.474
#> 7 Choice_04 control 66 0.6
#> 8 Choice_04 treatment 68 0.701
#> 9 Choice_05 control 86 0.782
#> 10 Choice_05 treatment 86 0.887
#> # … with 14 more rows
由 reprex package (v1.0.0)
于 2021-02-21 创建您可以重命名列,以便清楚区分 count
和 percentage
列,然后使用 pivot_longer
library(dplyr)
library(tidyr)
DT %>%
rename_with(~paste(sub('_.*', '', .),
rep(c('count', 'percentage'), each = 2), sep = '_'), -1) %>%
pivot_longer(cols = -ECOST,
names_to = c('group', '.value'),
names_sep = '_')
# A tibble: 24 x 4
# ECOST group count percentage
# <chr> <chr> <dbl> <dbl>
# 1 Choice_01 control 18 0.164
# 2 Choice_01 treatment 31 0.320
# 3 Choice_02 control 30 0.273
# 4 Choice_02 treatment 35 0.361
# 5 Choice_03 control 47 0.427
# 6 Choice_03 treatment 46 0.474
# 7 Choice_04 control 66 0.6
# 8 Choice_04 treatment 68 0.701
# 9 Choice_05 control 86 0.782
#10 Choice_05 treatment 86 0.887
# … with 14 more rows
这是一个 data.table
方法,其中 workaround 用于 melt.data.table()
library( data.table )
setDT(DT)
#get suffixes
suffix <- unique( sub( "(^.*)(_[a-z])", "\1", names( DT[ , -1] ) ) )
#melt
DT2 <- melt( DT, id.vars = "ECOST", measure.vars = patterns( count = "[a-oq-z]$", percentage = "_p$"))
#replace factor-levels with the colnames
setattr(DT2$variable, "levels", suffix )
ECOST variable count percentage
1: Choice_01 control 18 0.1636364
2: Choice_02 control 30 0.2727273
3: Choice_03 control 47 0.4272727
4: Choice_04 control 66 0.6000000
5: Choice_05 control 86 0.7818182
6: Choice_06 control 35 0.3181818
7: Choice_07 control 31 0.2818182
8: Choice_08 control 46 0.4181818
9: Choice_09 control 55 0.5000000
10: Choice_10 control 39 0.3545455
11: Choice_11 control 55 0.5000000
12: Choice_12 control 41 0.3727273
13: Choice_01 treatment 31 0.3195876
14: Choice_02 treatment 35 0.3608247
15: Choice_03 treatment 46 0.4742268
16: Choice_04 treatment 68 0.7010309
17: Choice_05 treatment 86 0.8865979
18: Choice_06 treatment 36 0.3711340
19: Choice_07 treatment 32 0.3298969
20: Choice_08 treatment 42 0.4329897
21: Choice_09 treatment 52 0.5360825
22: Choice_10 treatment 39 0.4020619
23: Choice_11 treatment 58 0.5979381
24: Choice_12 treatment 43 0.4432990
ECOST variable count percentage