pivot_longer分成几对列
pivot_longer into several pairs of columns
我需要 pivot_longer
跨越多组列,创建多个名称--值对。
例如,我需要从这样的事情开始:
df_raw <- tribble(
~id, ~belief_dog, ~belief_bull_frog, ~belief_fish, ~age, ~norm_bull_frog, ~norm_fish, ~norm_dog, ~gender,
"b2x8", 1, 4, 3, 41, 4, 2, 10, 2,
"m89w", 3, 6, 2, 19, 1, 2, 3, 1,
"32x8", 1, 5, 2, 38, 9, 1, 8, 3
)
然后把它变成这样的东西:
df_final <- tribble(
~id, ~belief_animal, ~belief_rating, ~norm_animal, ~norm_rating, ~age, ~gender,
"b2x8", "dog", 1, "bull_frog", 4, 41, 2,
"b2x8", "bull_frog", 4, "fish", 2, 41, 2,
"b2x8", "fish", 3, "dog", 10, 41, 2,
"m89w", "dog", 3, "bull_frog", 1, 19, 1,
"m89w", "bull_frog", 6, "fish", 2, 19, 1,
"m89w", "fish", 2, "dog", 3, 19, 1,
"32x8", "dog", 1, "bull_frog", 9, 38, 3,
"32x8", "bull_frog", 5, "fish", 1, 38, 3,
"32x8", "fish", 2, "dog", 8, 38, 3
)
换句话说,任何以“belief_”开头的内容都应该在一个名称-值对中进行转换,任何以“norm_”开头的内容都应该转换为另一个名称-值对。
我尝试查看其他几个内容有些相关的 Stack Overflow 页面,但无法将这些解决方案转化为这种情况。
任何帮助将不胜感激,强烈 偏好 dplyr
解决方案。
谢谢!
通过更多的实验解决了这个问题!
关键在于 names_to
和 names_pattern
参数。
df_raw %>% pivot_longer(
cols = c(belief_dog:belief_fish, norm_bull_frog:norm_dog),
names_to = c(".value", "rating"),
names_pattern = "([a-z]+)_*(.+)"
)
我不太明白 ".value"
或正则表达式 "([a-z]+)_*(.+)"
是如何工作的,但解决方案仍然有效。
使用 tidyverse
,我们可以在以 belief
和 norm
开头的两组列上进行旋转。然后我们可以使用正则表达式根据第一个下划线进行分组(因为有些列名有多个下划线)。本质上,我们是说将 belief
或 norm
(列名中的第一组)放入它们自己的列(即 .value
),然后是组的第二部分(即, 动物名称)被放入名为 animal
.
的一列中
library(tidyverse)
df_raw %>%
pivot_longer(cols = c(starts_with("belief"), starts_with("norm")),
names_to = c('.value', 'animal'),
names_pattern = '(.*?)_(.*)') %>%
rename(belief_rating = belief, norm_rating = norm)
输出
id age gender animal belief_rating norm_rating
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 b2x8 41 2 dog 1 10
2 b2x8 41 2 bull_frog 4 4
3 b2x8 41 2 fish 3 2
4 m89w 19 1 dog 3 3
5 m89w 19 1 bull_frog 6 1
6 m89w 19 1 fish 2 2
7 32x8 38 3 dog 1 8
8 32x8 38 3 bull_frog 5 9
9 32x8 38 3 fish 2 1
对于这些数据:
library(dplyr)
library(tidyr)
df_raw %>%
pivot_longer(
cols = -c(id, age, gender),
names_to = "name1",
values_to = "belief_rating"
) %>%
separate(name1, c("A", "B"), sep = '\_' , extra = 'merge') %>%
group_by(id) %>%
mutate(helper = rep(row_number(), each=3, length.out = n())) %>%
pivot_wider(
names_from = A,
values_from = B,
names_glue = "{A}_animal"
) %>%
mutate(norm_rating = ifelse(helper == 1, lead(belief_rating, 3), NA),
norm_animal = ifelse(helper == 1, lead(norm_animal, 3), NA)) %>%
slice(1:3) %>%
select(id, belief_animal, belief_rating, norm_animal, norm_rating, age, gender)
id belief_animal belief_rating norm_animal norm_rating age gender
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 32x8 dog 1 bull_frog 9 38 3
2 32x8 bull_frog 5 fish 1 38 3
3 32x8 fish 2 dog 8 38 3
4 b2x8 dog 1 bull_frog 4 41 2
5 b2x8 bull_frog 4 fish 2 41 2
6 b2x8 fish 3 dog 10 41 2
7 m89w dog 3 bull_frog 1 19 1
8 m89w bull_frog 6 fish 2 19 1
9 m89w fish 2 dog 3 19 1
我需要 pivot_longer
跨越多组列,创建多个名称--值对。
例如,我需要从这样的事情开始:
df_raw <- tribble(
~id, ~belief_dog, ~belief_bull_frog, ~belief_fish, ~age, ~norm_bull_frog, ~norm_fish, ~norm_dog, ~gender,
"b2x8", 1, 4, 3, 41, 4, 2, 10, 2,
"m89w", 3, 6, 2, 19, 1, 2, 3, 1,
"32x8", 1, 5, 2, 38, 9, 1, 8, 3
)
然后把它变成这样的东西:
df_final <- tribble(
~id, ~belief_animal, ~belief_rating, ~norm_animal, ~norm_rating, ~age, ~gender,
"b2x8", "dog", 1, "bull_frog", 4, 41, 2,
"b2x8", "bull_frog", 4, "fish", 2, 41, 2,
"b2x8", "fish", 3, "dog", 10, 41, 2,
"m89w", "dog", 3, "bull_frog", 1, 19, 1,
"m89w", "bull_frog", 6, "fish", 2, 19, 1,
"m89w", "fish", 2, "dog", 3, 19, 1,
"32x8", "dog", 1, "bull_frog", 9, 38, 3,
"32x8", "bull_frog", 5, "fish", 1, 38, 3,
"32x8", "fish", 2, "dog", 8, 38, 3
)
换句话说,任何以“belief_”开头的内容都应该在一个名称-值对中进行转换,任何以“norm_”开头的内容都应该转换为另一个名称-值对。
我尝试查看其他几个内容有些相关的 Stack Overflow 页面,但无法将这些解决方案转化为这种情况。
任何帮助将不胜感激,强烈 偏好 dplyr
解决方案。
谢谢!
通过更多的实验解决了这个问题!
关键在于 names_to
和 names_pattern
参数。
df_raw %>% pivot_longer(
cols = c(belief_dog:belief_fish, norm_bull_frog:norm_dog),
names_to = c(".value", "rating"),
names_pattern = "([a-z]+)_*(.+)"
)
我不太明白 ".value"
或正则表达式 "([a-z]+)_*(.+)"
是如何工作的,但解决方案仍然有效。
使用 tidyverse
,我们可以在以 belief
和 norm
开头的两组列上进行旋转。然后我们可以使用正则表达式根据第一个下划线进行分组(因为有些列名有多个下划线)。本质上,我们是说将 belief
或 norm
(列名中的第一组)放入它们自己的列(即 .value
),然后是组的第二部分(即, 动物名称)被放入名为 animal
.
library(tidyverse)
df_raw %>%
pivot_longer(cols = c(starts_with("belief"), starts_with("norm")),
names_to = c('.value', 'animal'),
names_pattern = '(.*?)_(.*)') %>%
rename(belief_rating = belief, norm_rating = norm)
输出
id age gender animal belief_rating norm_rating
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 b2x8 41 2 dog 1 10
2 b2x8 41 2 bull_frog 4 4
3 b2x8 41 2 fish 3 2
4 m89w 19 1 dog 3 3
5 m89w 19 1 bull_frog 6 1
6 m89w 19 1 fish 2 2
7 32x8 38 3 dog 1 8
8 32x8 38 3 bull_frog 5 9
9 32x8 38 3 fish 2 1
对于这些数据:
library(dplyr)
library(tidyr)
df_raw %>%
pivot_longer(
cols = -c(id, age, gender),
names_to = "name1",
values_to = "belief_rating"
) %>%
separate(name1, c("A", "B"), sep = '\_' , extra = 'merge') %>%
group_by(id) %>%
mutate(helper = rep(row_number(), each=3, length.out = n())) %>%
pivot_wider(
names_from = A,
values_from = B,
names_glue = "{A}_animal"
) %>%
mutate(norm_rating = ifelse(helper == 1, lead(belief_rating, 3), NA),
norm_animal = ifelse(helper == 1, lead(norm_animal, 3), NA)) %>%
slice(1:3) %>%
select(id, belief_animal, belief_rating, norm_animal, norm_rating, age, gender)
id belief_animal belief_rating norm_animal norm_rating age gender
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 32x8 dog 1 bull_frog 9 38 3
2 32x8 bull_frog 5 fish 1 38 3
3 32x8 fish 2 dog 8 38 3
4 b2x8 dog 1 bull_frog 4 41 2
5 b2x8 bull_frog 4 fish 2 41 2
6 b2x8 fish 3 dog 10 41 2
7 m89w dog 3 bull_frog 1 19 1
8 m89w bull_frog 6 fish 2 19 1
9 m89w fish 2 dog 3 19 1