如何使用地图函数在数据帧列表中使用 str_remove() ?
How can I use str_remove() within a list of dataframes using a map function?
我有一个数据框列表,它们都包含匹配的 ID 列。
例如...
dat1 = tribble(
~id, ~response,
"id_1", 10,
"id_2", 15
)
dat2 = tribble(
~id, ~response,
"id_3", 20,
"id_4", 25
)
example_list <- list(dat1, dat2)
> list(dat1, dat2)
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_1 10
2 id_2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_3 20
2 id_4 25
如何使用 str_remove()
跨数据帧映射以删除 id
列中每一行的“id_”前缀?
用purrr::map
,然后str_remove
(或gsub
或readr::parse_number
)。
library(tidyverse)
example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
#map(~ .x %>% mutate(id = gsub("id_", "", id)))
#map(~ mutate(.x, id = parse_number(id)))
输出
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 1 10
2 2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 3 20
2 4 25
您可以嵌套 modify_at()
以获得更快的速度。此外,substring
应该比某些文本匹配更快,因为您已经知道前缀的长度。
当然,您可能需要 as.integer()
将其转换回数字,但这与解决方案无关。
library(purrr)
example_list %>%
map(modify_at, "id", substring, 4)
# [[1]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 1 10
# 2 2 15
#
# [[2]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 3 20
# 2 4 25
# to convert to integer
example_list %>%
map(modify_at, "id", ~ as.integer(substring(.x, 4)))
运行几个选项作为基准:
library(purrr)
library(dplyr)
library(stringr)
microbenchmark::microbenchmark(
modify_substring = example_list %>%
map(modify_at, "id", substring, 4),
mutate_substring = example_list %>%
map(~ mutate(.x, id = substring(id, 4))),
mutate_str_remove = example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
)
您可以看到这种方法运行得更快。
Unit: microseconds
expr min lq mean median uq max neval
modify_substring 302.301 359.9005 442.340 419.6505 459.901 1597.401 100
mutate_substring 3019.502 3308.6015 4916.405 3540.5505 3847.801 116220.501 100
mutate_str_remove 4064.801 4568.4010 5355.351 4839.1010 5232.452 10521.701 100
我有一个数据框列表,它们都包含匹配的 ID 列。
例如...
dat1 = tribble(
~id, ~response,
"id_1", 10,
"id_2", 15
)
dat2 = tribble(
~id, ~response,
"id_3", 20,
"id_4", 25
)
example_list <- list(dat1, dat2)
> list(dat1, dat2)
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_1 10
2 id_2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_3 20
2 id_4 25
如何使用 str_remove()
跨数据帧映射以删除 id
列中每一行的“id_”前缀?
用purrr::map
,然后str_remove
(或gsub
或readr::parse_number
)。
library(tidyverse)
example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
#map(~ .x %>% mutate(id = gsub("id_", "", id)))
#map(~ mutate(.x, id = parse_number(id)))
输出
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 1 10
2 2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 3 20
2 4 25
您可以嵌套 modify_at()
以获得更快的速度。此外,substring
应该比某些文本匹配更快,因为您已经知道前缀的长度。
当然,您可能需要 as.integer()
将其转换回数字,但这与解决方案无关。
library(purrr)
example_list %>%
map(modify_at, "id", substring, 4)
# [[1]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 1 10
# 2 2 15
#
# [[2]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 3 20
# 2 4 25
# to convert to integer
example_list %>%
map(modify_at, "id", ~ as.integer(substring(.x, 4)))
运行几个选项作为基准:
library(purrr)
library(dplyr)
library(stringr)
microbenchmark::microbenchmark(
modify_substring = example_list %>%
map(modify_at, "id", substring, 4),
mutate_substring = example_list %>%
map(~ mutate(.x, id = substring(id, 4))),
mutate_str_remove = example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
)
您可以看到这种方法运行得更快。
Unit: microseconds
expr min lq mean median uq max neval
modify_substring 302.301 359.9005 442.340 419.6505 459.901 1597.401 100
mutate_substring 3019.502 3308.6015 4916.405 3540.5505 3847.801 116220.501 100
mutate_str_remove 4064.801 4568.4010 5355.351 4839.1010 5232.452 10521.701 100