使用指定分组变量级别的键重塑宽长
reshape wide long with key specifying levels of grouping variable
我有一个数据集
structure(list(group = c("A", "B", "A", "B", "B", "A", "A"),
technique = c("attack", "defenese", "attack ", "defense ",
"defense ", "attack", "defense "), outcome1.part1 = c(24L,
1234L, 15L, 234L, 23L, 3L, 3L), outcome1.part.2 = c(52L,
321L, 23L, 234L, 234L, 145L, 145L), outcome1.part.3 = c(14L,
23L, 3L, 2L, 234L, 234L, 234L), outcome2.part.1 = c(14L,
234L, 145L, 4L, 234L, 145L, 145L), outcome2.part.2 = c(234L,
234L, 234L, 234L, 234L, 234L, 234L), outcome2.part.3 = c(234L,
234L, 234L, 234L, 145L, 145L, 145L)), class = "data.frame", row.names = c(NA,
-7L))
数据集需要转换为长格式,但我想制作一个向量,从指定的键中查找任何短语,然后决定分配哪个级别的分组变量。
所以在列 outcome1.part.2
中说我想将其转换为长格式并创建一个名为 strata
的列,该列使用列名称中找到的键的任何值。所以关键是 c("part.1", "part.2", "part.3")
它会像这样转换一行。
至此
我不想使用正则表达式解决方案,因为我想灵活地更改键中的值,而无需为分组变量的每个级别找出新的正则表达式解决方案。
我们可以使用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("part.1", "part.2", "part.3")
pat <- sprintf("^(outcome\d*).*(%s).*$", str_c(v1, collapse="|"))
df1 %>%
pivot_longer(cols = starts_with('outcome'),
names_to = c(".value", "strata"),
names_pattern = pat)
-输出
# A tibble: 21 × 5
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# … with 11 more rows
注意:列名中有一个拼写错误,即第三列应该是
names(df1)[3] <- 'outcome1.part.1'
没有正则表达式的解决方案,利用 separate
中的 extra = "merge"
参数
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = -c(id, group, technique)
) %>%
separate(name, into=c('outcome', 'strata'), extra = "merge") %>%
pivot_wider(
names_from = outcome,
values_from = value,
) %>%
select(-id)
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# ... with 11 more rows
我有一个数据集
structure(list(group = c("A", "B", "A", "B", "B", "A", "A"),
technique = c("attack", "defenese", "attack ", "defense ",
"defense ", "attack", "defense "), outcome1.part1 = c(24L,
1234L, 15L, 234L, 23L, 3L, 3L), outcome1.part.2 = c(52L,
321L, 23L, 234L, 234L, 145L, 145L), outcome1.part.3 = c(14L,
23L, 3L, 2L, 234L, 234L, 234L), outcome2.part.1 = c(14L,
234L, 145L, 4L, 234L, 145L, 145L), outcome2.part.2 = c(234L,
234L, 234L, 234L, 234L, 234L, 234L), outcome2.part.3 = c(234L,
234L, 234L, 234L, 145L, 145L, 145L)), class = "data.frame", row.names = c(NA,
-7L))
数据集需要转换为长格式,但我想制作一个向量,从指定的键中查找任何短语,然后决定分配哪个级别的分组变量。
所以在列 outcome1.part.2
中说我想将其转换为长格式并创建一个名为 strata
的列,该列使用列名称中找到的键的任何值。所以关键是 c("part.1", "part.2", "part.3")
它会像这样转换一行。
至此
我不想使用正则表达式解决方案,因为我想灵活地更改键中的值,而无需为分组变量的每个级别找出新的正则表达式解决方案。
我们可以使用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("part.1", "part.2", "part.3")
pat <- sprintf("^(outcome\d*).*(%s).*$", str_c(v1, collapse="|"))
df1 %>%
pivot_longer(cols = starts_with('outcome'),
names_to = c(".value", "strata"),
names_pattern = pat)
-输出
# A tibble: 21 × 5
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# … with 11 more rows
注意:列名中有一个拼写错误,即第三列应该是
names(df1)[3] <- 'outcome1.part.1'
没有正则表达式的解决方案,利用 separate
extra = "merge"
参数
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = -c(id, group, technique)
) %>%
separate(name, into=c('outcome', 'strata'), extra = "merge") %>%
pivot_wider(
names_from = outcome,
values_from = value,
) %>%
select(-id)
group technique strata outcome1 outcome2
<chr> <chr> <chr> <int> <int>
1 A "attack" part.1 24 14
2 A "attack" part.2 52 234
3 A "attack" part.3 14 234
4 B "defenese" part.1 1234 234
5 B "defenese" part.2 321 234
6 B "defenese" part.3 23 234
7 A "attack " part.1 15 145
8 A "attack " part.2 23 234
9 A "attack " part.3 3 234
10 B "defense " part.1 234 4
# ... with 11 more rows