如何使用 map* 和 mutate 将列表转换为一组附加列?
How can I use map* and mutate to convert a list into a set of additional columns?
在 天 的时间里,我可能已经尝试了数百种此代码的排列,以尝试获得一个可以执行我想要的功能的函数,但我最终放弃了。感觉应该是绝对可行的,我已经很接近了!
我试图用下面的代表回到这里的核心。
基本上我有一个单行数据框,一列包含一个字符串列表("concepts")。我想为这些字符串中的每一个创建一个额外的列,使用 mutate
,理想情况下该列从字符串中获取其名称,然后用函数调用的结果填充该列(?它不现在不管哪个功能?-我只需要该功能的基础设施即可工作。)
我觉得,像往常一样,我肯定遗漏了一些明显的东西……也许只是一个语法错误。
我也想知道我是否需要使用 purrr::map
,也许更简单的矢量化映射就可以了。
我觉得新列被命名为 ..1
而不是概念名称这一事实是关于问题所在的一些线索。
我可以通过手动调用每个概念来创建我想要的数据框(请参阅 reprex 的结尾)但是由于概念列表对于不同的数据框是不同的,我想使用管道和 tidyverse 技术来实现它而不是做手动。
我已阅读以下问题以寻求帮助:
- How to mutate multiple columns with dynamic variable using purrr:map function?
- (R) Cleaner way to use map() with list-columns
- Creating new variables with purrr (how does one go about that?)
但其中 none 帮助我解决了我遇到的问题。 [编辑: 在最后一个问题中添加到该列表中,这可能是我需要的技术。
<!-- language-all: lang-r -->
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
# just use the first tibble for now, for testing --------------------------
# ideally I want to map across dfs through a list -------------------------
df <- df[[1]]
# nitty gritty functions --------------------------------------------------
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
# get_concept_list() returns the strings I need:
get_concept_list(df)
#> [1] "time" "gender" "c_age" "measures"
# Here is a list of examples of types of map* that do various things,
# none of which is what I need it to do
# I'm using toupper() here for simplicity - ultimately I will use
# get_concept_info() to populate the new columns
# this creates four new tibbles
get_concept_list(df) %>%
map(~ mutate(df, {{.x}} := toupper(.x)))
#> [[1]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#>
#> [[2]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
#>
#> [[3]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
#>
#> [[4]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this throws an error
get_concept_list(df) %>%
map_chr(~ mutate(df, {{.x}} := toupper(.x)))
#> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3
# this creates three extra rows in the tibble
get_concept_list(df) %>%
map_df(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this does the same as map_df
get_concept_list(df) %>%
map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this creates a single tibble 12 columns wide
get_concept_list(df) %>%
map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 1 x 12
#> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
#> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
#> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>
# function to get info on each concept (except geography) -----------------
# this is the function I want to use eventually to populate my new columns
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id) %>%
filter(name == "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
# individual mutate works, for comparison ---------------------------------
# I can create the kind of table I want manually using a line like the one below
# df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
df %>% mutate(., measures = get_concept_info(df, "measures"))
#> # A tibble: 1 x 3
#> dataset_title dataset_id measures
#> <chr> <chr> <list>
#> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~
<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
使用 !!
和 :=
可以动态命名列。然后,我们可以使用 reduce()
减少 map()
的列表输出,其中 left_joins() 使用数据集标题和 id 列列表中的所有数据帧。
df_2 <-
map(get_concept_list(df),
~ mutate(df,
!!.x := get_concept_info(df, .x))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))
df_2
# A tibble: 1 x 6
dataset_title dataset_id time gender c_age measures
<chr> <chr> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
1 Population estimates - local authority based by single year NM_2002_1 [28 x 2] [3 x 2] [121 x 2] [2 x 2]
在 天 的时间里,我可能已经尝试了数百种此代码的排列,以尝试获得一个可以执行我想要的功能的函数,但我最终放弃了。感觉应该是绝对可行的,我已经很接近了!
我试图用下面的代表回到这里的核心。
基本上我有一个单行数据框,一列包含一个字符串列表("concepts")。我想为这些字符串中的每一个创建一个额外的列,使用 mutate
,理想情况下该列从字符串中获取其名称,然后用函数调用的结果填充该列(?它不现在不管哪个功能?-我只需要该功能的基础设施即可工作。)
我觉得,像往常一样,我肯定遗漏了一些明显的东西……也许只是一个语法错误。
我也想知道我是否需要使用 purrr::map
,也许更简单的矢量化映射就可以了。
我觉得新列被命名为 ..1
而不是概念名称这一事实是关于问题所在的一些线索。
我可以通过手动调用每个概念来创建我想要的数据框(请参阅 reprex 的结尾)但是由于概念列表对于不同的数据框是不同的,我想使用管道和 tidyverse 技术来实现它而不是做手动。
我已阅读以下问题以寻求帮助:
- How to mutate multiple columns with dynamic variable using purrr:map function?
- (R) Cleaner way to use map() with list-columns
- Creating new variables with purrr (how does one go about that?)
但其中 none 帮助我解决了我遇到的问题。 [编辑: 在最后一个问题中添加到该列表中,这可能是我需要的技术。
<!-- language-all: lang-r -->
# load packages -----------------------------------------------------------
library(rlang)
library(dplyr)
library(tidyr)
library(magrittr)
library(purrr)
library(nomisr)
# set up initial list of tibbles ------------------------------------------
df <- list(
district_population = tibble(
dataset_title = "Population estimates - local authority based by single year",
dataset_id = "NM_2002_1"
),
jsa_claimants = tibble(
dataset_title = "Jobseeker\'s Allowance with rates and proportions",
dataset_id = "NM_1_1"
)
)
# just use the first tibble for now, for testing --------------------------
# ideally I want to map across dfs through a list -------------------------
df <- df[[1]]
# nitty gritty functions --------------------------------------------------
get_concept_list <- function(df) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id,
select = c("dimensions", "codes")) %>%
pluck("value", 1, "dimension") %>%
filter(!concept == "geography") %>%
pull("concept")
}
# get_concept_list() returns the strings I need:
get_concept_list(df)
#> [1] "time" "gender" "c_age" "measures"
# Here is a list of examples of types of map* that do various things,
# none of which is what I need it to do
# I'm using toupper() here for simplicity - ultimately I will use
# get_concept_info() to populate the new columns
# this creates four new tibbles
get_concept_list(df) %>%
map(~ mutate(df, {{.x}} := toupper(.x)))
#> [[1]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#>
#> [[2]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
#>
#> [[3]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
#>
#> [[4]]
#> # A tibble: 1 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this throws an error
get_concept_list(df) %>%
map_chr(~ mutate(df, {{.x}} := toupper(.x)))
#> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3
# this creates three extra rows in the tibble
get_concept_list(df) %>%
map_df(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this does the same as map_df
get_concept_list(df) %>%
map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 4 x 3
#> dataset_title dataset_id ..1
#> <chr> <chr> <chr>
#> 1 Population estimates - local authority based by single year NM_2002_1 TIME
#> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
#> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
#> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~
# this creates a single tibble 12 columns wide
get_concept_list(df) %>%
map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
#> # A tibble: 1 x 12
#> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
#> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
#> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>
# function to get info on each concept (except geography) -----------------
# this is the function I want to use eventually to populate my new columns
get_concept_info <- function(df, concept_name) {
dataset_id <- pluck(df, "dataset_id")
nomis_overview(id = dataset_id) %>%
filter(name == "dimensions") %>%
pluck("value", 1, "dimension") %>%
filter(concept == concept_name) %>%
pluck("codes.code", 1) %>%
select(name, value) %>%
nest(data = everything()) %>%
as.list() %>%
pluck("data")
}
# individual mutate works, for comparison ---------------------------------
# I can create the kind of table I want manually using a line like the one below
# df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
df %>% mutate(., measures = get_concept_info(df, "measures"))
#> # A tibble: 1 x 3
#> dataset_title dataset_id measures
#> <chr> <chr> <list>
#> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~
<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
使用 !!
和 :=
可以动态命名列。然后,我们可以使用 reduce()
减少 map()
的列表输出,其中 left_joins() 使用数据集标题和 id 列列表中的所有数据帧。
df_2 <-
map(get_concept_list(df),
~ mutate(df,
!!.x := get_concept_info(df, .x))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))
df_2
# A tibble: 1 x 6
dataset_title dataset_id time gender c_age measures
<chr> <chr> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
1 Population estimates - local authority based by single year NM_2002_1 [28 x 2] [3 x 2] [121 x 2] [2 x 2]