如何在 cols = 以特定前缀开头的任何列时使用 tidyr pivot_longer
How to use tidyr pivot_longer when cols = any column that starts with a certain prefix
每周,我都会得到一个原始数据集,我需要从中生成报告。我想编写一个每周都可以运行的 R 脚本。不幸的是,根据收集的标本,每周原始数据的列组略有不同。这是一周一周的情况示例。
library(readr)
w1 <- read_csv("species, males, females - fed, females - unfed
a,2,0,3
b,5,7,2
c,8,4,9")
w2 <- read_csv("species, males, females - mixed
a,2,0
b,5,7
c,8,4")
> w1
# A tibble: 3 x 4
species males `females - fed` `females - unfed`
<chr> <dbl> <dbl> <dbl>
1 a 2 0 3
2 b 5 7 2
3 c 8 4 9
> w2
# A tibble: 3 x 3
species males `females - mixed`
<chr> <dbl> <dbl>
1 a 2 0
2 b 5 7
3 c 8 4
这是我通常使用的方式 pivot_longer:
library(tidyr)
w1 %>% pivot_longer(cols = c(males, `females - fed`, `females - unfed`),
names_to = c("sex","feeding_status"),
names_sep = " - ",
values_to = "quantity")
# A tibble: 9 x 4
species sex feeding_status quantity
<chr> <chr> <chr> <dbl>
1 a males NA 2
2 a females fed 0
3 a females unfed 3
4 b males NA 5
5 b females fed 7
6 b females unfed 2
7 c males NA 8
8 c females fed 4
9 c females unfed 9
我如何为 pivot_longer 编写适用于 w1、w2 和 w3 的代码? (编辑以包括 w3 -- 见下文)
我试过 (select(., starts_with("females")) 但无法找出正确的语法。pivot_longer 文档提到 names_pattern()和extract(),看起来很有前途,但我不知道如何使用它们。谢谢!
编辑:为了回应 akrun 的回答,我意识到我必须提供稍微复杂一些的样本数据。该代码还需要处理偶尔出现在名为 "unknown sex" 的数据集中的列,如下所示:
w3 <- read_csv("species, males, females - mixed, unknown sex
a,2,0,4
b,5,7,0
c,8,4,23")
> w3
# A tibble: 3 x 4
species males `females - mixed` `unknown sex`
<chr> <dbl> <dbl> <dbl>
1 a 2 0 4
2 b 5 7 0
3 c 8 4 23
akrun 下面建议的用于解决 w1 和 w2 的代码导致 w3 的 "unknown sex" 列中的值加倍:
w3 %>%
pivot_longer(cols = c(males, starts_with('females')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 6 x 5
species `unknown sex` sex feeding_status value
<chr> <dbl> <chr> <chr> <dbl>
1 a 4 males NA 2
2 a 4 females mixed 0
3 b 0 males NA 5
4 b 0 females mixed 7
5 c 23 males NA 8
6 c 23 females mixed 4
常用的列是'species',所以我们可以用-
library(dplyr)
library(tidyr)
w1 %>%
pivot_longer(cols = -species,
names_to = c("sex","feeding_status"),
names_sep = " - ",
values_to = "quantity")
或select_helpers
之一
w1 %>%
pivot_longer(cols = c(males, starts_with('females')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
# species sex feeding_status value
# <chr> <chr> <chr> <dbl>
#1 a males <NA> 2
#2 a females fed 0
#3 a females unfed 3
#4 b males <NA> 5
#5 b females fed 7
#6 b females unfed 2
#7 c males <NA> 8
#8 c females fed 4
#9 c females unfed 9
如果我们想包括多个案例,那么matches
在另一个选项
w3 %>%
pivot_longer(cols = c(males, matches('^(females|unknown)')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
# species sex feeding_status value
# <chr> <chr> <chr> <dbl>
#1 a males <NA> 2
#2 a females mixed 0
#3 a unknown sex <NA> 4
#4 b males <NA> 5
#5 b females mixed 7
#6 b unknown sex <NA> 0
#7 c males <NA> 8
#8 c females mixed 4
#9 c unknown sex <NA> 23
每周,我都会得到一个原始数据集,我需要从中生成报告。我想编写一个每周都可以运行的 R 脚本。不幸的是,根据收集的标本,每周原始数据的列组略有不同。这是一周一周的情况示例。
library(readr)
w1 <- read_csv("species, males, females - fed, females - unfed
a,2,0,3
b,5,7,2
c,8,4,9")
w2 <- read_csv("species, males, females - mixed
a,2,0
b,5,7
c,8,4")
> w1
# A tibble: 3 x 4
species males `females - fed` `females - unfed`
<chr> <dbl> <dbl> <dbl>
1 a 2 0 3
2 b 5 7 2
3 c 8 4 9
> w2
# A tibble: 3 x 3
species males `females - mixed`
<chr> <dbl> <dbl>
1 a 2 0
2 b 5 7
3 c 8 4
这是我通常使用的方式 pivot_longer:
library(tidyr)
w1 %>% pivot_longer(cols = c(males, `females - fed`, `females - unfed`),
names_to = c("sex","feeding_status"),
names_sep = " - ",
values_to = "quantity")
# A tibble: 9 x 4
species sex feeding_status quantity
<chr> <chr> <chr> <dbl>
1 a males NA 2
2 a females fed 0
3 a females unfed 3
4 b males NA 5
5 b females fed 7
6 b females unfed 2
7 c males NA 8
8 c females fed 4
9 c females unfed 9
我如何为 pivot_longer 编写适用于 w1、w2 和 w3 的代码? (编辑以包括 w3 -- 见下文)
我试过 (select(., starts_with("females")) 但无法找出正确的语法。pivot_longer 文档提到 names_pattern()和extract(),看起来很有前途,但我不知道如何使用它们。谢谢!
编辑:为了回应 akrun 的回答,我意识到我必须提供稍微复杂一些的样本数据。该代码还需要处理偶尔出现在名为 "unknown sex" 的数据集中的列,如下所示:
w3 <- read_csv("species, males, females - mixed, unknown sex
a,2,0,4
b,5,7,0
c,8,4,23")
> w3
# A tibble: 3 x 4
species males `females - mixed` `unknown sex`
<chr> <dbl> <dbl> <dbl>
1 a 2 0 4
2 b 5 7 0
3 c 8 4 23
akrun 下面建议的用于解决 w1 和 w2 的代码导致 w3 的 "unknown sex" 列中的值加倍:
w3 %>%
pivot_longer(cols = c(males, starts_with('females')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 6 x 5
species `unknown sex` sex feeding_status value
<chr> <dbl> <chr> <chr> <dbl>
1 a 4 males NA 2
2 a 4 females mixed 0
3 b 0 males NA 5
4 b 0 females mixed 7
5 c 23 males NA 8
6 c 23 females mixed 4
常用的列是'species',所以我们可以用-
library(dplyr)
library(tidyr)
w1 %>%
pivot_longer(cols = -species,
names_to = c("sex","feeding_status"),
names_sep = " - ",
values_to = "quantity")
或select_helpers
w1 %>%
pivot_longer(cols = c(males, starts_with('females')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
# species sex feeding_status value
# <chr> <chr> <chr> <dbl>
#1 a males <NA> 2
#2 a females fed 0
#3 a females unfed 3
#4 b males <NA> 5
#5 b females fed 7
#6 b females unfed 2
#7 c males <NA> 8
#8 c females fed 4
#9 c females unfed 9
如果我们想包括多个案例,那么matches
在另一个选项
w3 %>%
pivot_longer(cols = c(males, matches('^(females|unknown)')),
names_to = c("sex", "feeding_status"), names_sep=" - ")
# A tibble: 9 x 4
# species sex feeding_status value
# <chr> <chr> <chr> <dbl>
#1 a males <NA> 2
#2 a females mixed 0
#3 a unknown sex <NA> 4
#4 b males <NA> 5
#5 b females mixed 7
#6 b unknown sex <NA> 0
#7 c males <NA> 8
#8 c females mixed 4
#9 c unknown sex <NA> 23