在 R 中从长格式重塑为宽格式,变量重命名问题
Reshaping from long to wide format in R, problem with variables re-naming
我有一个长格式的数据集,我想将其整形为宽格式。我通常知道该怎么做,但问题出在变量名上。长格式的变量如下所示:
ID
time
WSAS_01
1
1
4
1
2
3
2
1
6
2
2
8
但重塑后我希望变量的名称是这样的,所以时间1是_r1_
(时间2是_r2_
)并且它在中间姓名:
ID
WSAS_r1_01
WSAS_r2_01
1
4
3
2
6
8
有人知道怎么做吗?
使用 pivot_wider()
,您可以提供使用 names_from
列(和特殊 .value
)创建自定义列名称的粘合规范。
library(tidyr)
library(stringr)
df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = WSAS_01)
# # A tibble: 2 × 3
# ID WSAS_r1_01 WSAS_r2_01
# <int> <int> <int>
# 1 1 4 3
# 2 2 6 8
在 values_from
包含多个值的扩展情况下,此方法也适用:
df <- data.frame(
ID = rep(1:2, each = 2),
time = rep(1:2, 2),
WSAS_01 = c(4, 3, 6, 8),
WSAS_02 = c(1, 3, 5, 7)
)
df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = starts_with("WSAS"))
# # A tibble: 2 × 5
# ID WSAS_r1_01 WSAS_r2_01 WSAS_r1_02 WSAS_r2_02
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 4 3 1 3
# 2 2 6 8 5 7
你可以试试这个:
df %>%
pivot_wider(
id_cols=ID,
names_from=time,
values_from = WSAS_01,
names_glue="{paste0(str_sub(.value,1,4),'_r', time,'_',str_sub(.value,6,7))}"
)
输出:
# A tibble: 2 × 3
ID WSAS_r1_01 WSAS_r2_01
<dbl> <dbl> <dbl>
1 1 4 3
2 2 6 8
tidyr::pivot_wider
的另一种方式,但关于 names_glue
的部分更简洁:
library(tidyr)
df <- read.table(text = "
ID time WSAS_01
1 1 4
1 2 3
2 1 6
2 2 8", header=T)
df %>%
pivot_wider(names_from=time, values_from=WSAS_01, names_glue="WSAS_r{time}_01")
#> # A tibble: 2 × 3
#> ID WSAS_r1_01 WSAS_r2_01
#> <int> <int> <int>
#> 1 1 4 3
#> 2 2 6 8
我有一个长格式的数据集,我想将其整形为宽格式。我通常知道该怎么做,但问题出在变量名上。长格式的变量如下所示:
ID | time | WSAS_01 |
---|---|---|
1 | 1 | 4 |
1 | 2 | 3 |
2 | 1 | 6 |
2 | 2 | 8 |
但重塑后我希望变量的名称是这样的,所以时间1是_r1_
(时间2是_r2_
)并且它在中间姓名:
ID | WSAS_r1_01 | WSAS_r2_01 |
---|---|---|
1 | 4 | 3 |
2 | 6 | 8 |
有人知道怎么做吗?
使用 pivot_wider()
,您可以提供使用 names_from
列(和特殊 .value
)创建自定义列名称的粘合规范。
library(tidyr)
library(stringr)
df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = WSAS_01)
# # A tibble: 2 × 3
# ID WSAS_r1_01 WSAS_r2_01
# <int> <int> <int>
# 1 1 4 3
# 2 2 6 8
在 values_from
包含多个值的扩展情况下,此方法也适用:
df <- data.frame(
ID = rep(1:2, each = 2),
time = rep(1:2, 2),
WSAS_01 = c(4, 3, 6, 8),
WSAS_02 = c(1, 3, 5, 7)
)
df %>%
pivot_wider(
names_from = time,
names_glue = "{str_replace(.value, '(?=_)', str_c('_r', time))}",
values_from = starts_with("WSAS"))
# # A tibble: 2 × 5
# ID WSAS_r1_01 WSAS_r2_01 WSAS_r1_02 WSAS_r2_02
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 4 3 1 3
# 2 2 6 8 5 7
你可以试试这个:
df %>%
pivot_wider(
id_cols=ID,
names_from=time,
values_from = WSAS_01,
names_glue="{paste0(str_sub(.value,1,4),'_r', time,'_',str_sub(.value,6,7))}"
)
输出:
# A tibble: 2 × 3
ID WSAS_r1_01 WSAS_r2_01
<dbl> <dbl> <dbl>
1 1 4 3
2 2 6 8
tidyr::pivot_wider
的另一种方式,但关于 names_glue
的部分更简洁:
library(tidyr)
df <- read.table(text = "
ID time WSAS_01
1 1 4
1 2 3
2 1 6
2 2 8", header=T)
df %>%
pivot_wider(names_from=time, values_from=WSAS_01, names_glue="WSAS_r{time}_01")
#> # A tibble: 2 × 3
#> ID WSAS_r1_01 WSAS_r2_01
#> <int> <int> <int>
#> 1 1 4 3
#> 2 2 6 8