tidyr unnest,在取消嵌套期间使用嵌套名称为列名添加前缀
tidyr unnest, prefix column names with nested name during unnesting
When 运行ning unnest
on a data.frame
有没有办法将嵌套项的组名添加到它包含的各个列中(作为后缀或前缀) .或者是否必须通过 rename
手动完成重命名?
这与 'unnesting' 包含同名列的多个组特别相关。
在下面的示例中,base
aggregate
命令做得很好(例如 Petal.Length.mn),但我找不到让 unnest
到做同样的事情?
我正在使用 nest
和 purrr::map
,因为我想要灵活地混合功能,例如。计算几个变量的均值和 sd 以及 运行 t 检验以查看它们之间的差异。
library(dplyr, warn.conflicts = FALSE)
msd_c <- function(x) c(mn = mean(x), sd = sd(x))
msd_df <- function(x) bind_rows(c(mn = mean(x), sd = sd(x)))
aggregate(cbind(Petal.Length, Petal.Width) ~ Species,
data = iris, FUN = msd_c)
#> Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd
#> 1 setosa 1.4620000 0.1736640 0.2460000 0.1053856
#> 2 versicolor 4.2600000 0.4699110 1.3260000 0.1977527
#> 3 virginica 5.5520000 0.5518947 2.0260000 0.2746501
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.$Petal.Length)),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width)),
Correlation = purrr::map(data, ~ broom::tidy(cor.test(.$Petal.Length, .$Petal.Width))),
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Correlation), names_repair = tidyr::tidyr_legacy)
#> # A tibble: 3 x 13
#> # Groups: Species [3]
#> Species mn sd mn1 sd1 estimate statistic p.value parameter conf.low
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 setosa 1.46 0.174 0.246 0.105 0.332 2.44 1.86e- 2 48 0.0587
#> 2 versic~ 4.26 0.470 1.33 0.198 0.787 8.83 1.27e-11 48 0.651
#> 3 virgin~ 5.55 0.552 2.03 0.275 0.322 2.36 2.25e- 2 48 0.0481
#> # ... with 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>
由 reprex package (v0.3.0)
于 2020-05-20 创建
要将多个函数应用于多个列,我会使用 summarise_at
/mutate_at
而不是嵌套和取消嵌套数据。
例如,在这种情况下我们可以这样做:
library(dplyr)
iris %>%
group_by(Species) %>%
summarise_at(vars(Petal.Length:Petal.Width), list(mn = mean, sd = sd))
# Species Petal.Length_mn Petal.Width_mn Petal.Length_sd Petal.Width_sd
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 setosa 1.46 0.246 0.174 0.105
#2 versicolor 4.26 1.33 0.470 0.198
#3 virginica 5.55 2.03 0.552 0.275
这会自动为我们应用该函数的列名添加前缀。此外,这相当于您尝试过的 aggregate
函数的 dplyr
版本。
另请注意,在即将推出的 dplyr
版本中,summarise_at
将很快被 across
取代。
您可以像下面那样使用 setNames
。有点罗嗦,但您似乎打算为每一列指定每个函数,这可能很有趣。
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.x$Petal.Length) %>%
setNames(paste0("Petal.Length.", names(.)))),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width) %>%
setNames(paste0("Petal.Width.", names(.)))),
Ratio = purrr::map(data, ~ msd_df(.$Petal.Length/.$Petal.Width) %>%
setNames(paste0("Ratio.", names(.))))
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Ratio))
# A tibble: 3 x 7
# Groups: Species [3]
Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd Ratio.mn Ratio.sd
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 1.46 0.174 0.246 0.105 6.91 2.85
2 versicolor 4.26 0.470 1.33 0.198 3.24 0.312
3 virginica 5.55 0.552 2.03 0.275 2.78 0.407
或者修改您的函数,使其能够像这样修改列名。
msd_df_name <- function(x, name){
bind_rows(c(mn = mean(x), sd = sd(x))) %>%
setNames(paste0(name, ".", names(.)))
}
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df_name(.x$Petal.Length, "Petal.Length")),
Petal.Width = purrr::map(data, ~ msd_df_name(.$Petal.Width, "Petal.Width")),
Ratio = purrr::map(data, ~ msd_df_name(.$Petal.Length/.$Petal.Width, "Ratio"))
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Ratio))
答案很明显,使用 names_sep
选项而不是 names_repair
选项。引自 names_sep
下的 nest
帮助菜单:
If a string, the inner and outer names will be used together. In
nest(), the names of the new outer columns will be formed by pasting
together the outer and the inner column names, separated by names_sep.
In unnest(), the new inner names will have the outer names (+
names_sep) automatically stripped. This makes names_sep roughly
symmetric between nesting and unnesting.
library(dplyr, warn.conflicts = FALSE)
msd_c <- function(x) c(mn = mean(x), sd = sd(x))
msd_df <- function(x) bind_rows(c(mn = mean(x), sd = sd(x)))
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.$Petal.Length)),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width)),
Correlation = purrr::map(data, ~ broom::tidy(cor.test(.$Petal.Length, .$Petal.Width))),
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Correlation), names_sep = ".")
#> # A tibble: 3 x 13
#> # Groups: Species [3]
#> Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 1.46 0.174 0.246 0.105
#> 2 versic~ 4.26 0.470 1.33 0.198
#> 3 virgin~ 5.55 0.552 2.03 0.275
#> # ... with 8 more variables: Correlation.estimate <dbl>,
#> # Correlation.statistic <dbl>, Correlation.p.value <dbl>,
#> # Correlation.parameter <int>, Correlation.conf.low <dbl>,
#> # Correlation.conf.high <dbl>, Correlation.method <chr>,
#> # Correlation.alternative <chr>
由 reprex package (v0.3.0)
于 2020 年 6 月 10 日创建
When 运行ning unnest
on a data.frame
有没有办法将嵌套项的组名添加到它包含的各个列中(作为后缀或前缀) .或者是否必须通过 rename
手动完成重命名?
这与 'unnesting' 包含同名列的多个组特别相关。
在下面的示例中,base
aggregate
命令做得很好(例如 Petal.Length.mn),但我找不到让 unnest
到做同样的事情?
我正在使用 nest
和 purrr::map
,因为我想要灵活地混合功能,例如。计算几个变量的均值和 sd 以及 运行 t 检验以查看它们之间的差异。
library(dplyr, warn.conflicts = FALSE)
msd_c <- function(x) c(mn = mean(x), sd = sd(x))
msd_df <- function(x) bind_rows(c(mn = mean(x), sd = sd(x)))
aggregate(cbind(Petal.Length, Petal.Width) ~ Species,
data = iris, FUN = msd_c)
#> Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd
#> 1 setosa 1.4620000 0.1736640 0.2460000 0.1053856
#> 2 versicolor 4.2600000 0.4699110 1.3260000 0.1977527
#> 3 virginica 5.5520000 0.5518947 2.0260000 0.2746501
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.$Petal.Length)),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width)),
Correlation = purrr::map(data, ~ broom::tidy(cor.test(.$Petal.Length, .$Petal.Width))),
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Correlation), names_repair = tidyr::tidyr_legacy)
#> # A tibble: 3 x 13
#> # Groups: Species [3]
#> Species mn sd mn1 sd1 estimate statistic p.value parameter conf.low
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 setosa 1.46 0.174 0.246 0.105 0.332 2.44 1.86e- 2 48 0.0587
#> 2 versic~ 4.26 0.470 1.33 0.198 0.787 8.83 1.27e-11 48 0.651
#> 3 virgin~ 5.55 0.552 2.03 0.275 0.322 2.36 2.25e- 2 48 0.0481
#> # ... with 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>
由 reprex package (v0.3.0)
于 2020-05-20 创建要将多个函数应用于多个列,我会使用 summarise_at
/mutate_at
而不是嵌套和取消嵌套数据。
例如,在这种情况下我们可以这样做:
library(dplyr)
iris %>%
group_by(Species) %>%
summarise_at(vars(Petal.Length:Petal.Width), list(mn = mean, sd = sd))
# Species Petal.Length_mn Petal.Width_mn Petal.Length_sd Petal.Width_sd
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 setosa 1.46 0.246 0.174 0.105
#2 versicolor 4.26 1.33 0.470 0.198
#3 virginica 5.55 2.03 0.552 0.275
这会自动为我们应用该函数的列名添加前缀。此外,这相当于您尝试过的 aggregate
函数的 dplyr
版本。
另请注意,在即将推出的 dplyr
版本中,summarise_at
将很快被 across
取代。
您可以像下面那样使用 setNames
。有点罗嗦,但您似乎打算为每一列指定每个函数,这可能很有趣。
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.x$Petal.Length) %>%
setNames(paste0("Petal.Length.", names(.)))),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width) %>%
setNames(paste0("Petal.Width.", names(.)))),
Ratio = purrr::map(data, ~ msd_df(.$Petal.Length/.$Petal.Width) %>%
setNames(paste0("Ratio.", names(.))))
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Ratio))
# A tibble: 3 x 7
# Groups: Species [3]
Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd Ratio.mn Ratio.sd
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 1.46 0.174 0.246 0.105 6.91 2.85
2 versicolor 4.26 0.470 1.33 0.198 3.24 0.312
3 virginica 5.55 0.552 2.03 0.275 2.78 0.407
或者修改您的函数,使其能够像这样修改列名。
msd_df_name <- function(x, name){
bind_rows(c(mn = mean(x), sd = sd(x))) %>%
setNames(paste0(name, ".", names(.)))
}
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df_name(.x$Petal.Length, "Petal.Length")),
Petal.Width = purrr::map(data, ~ msd_df_name(.$Petal.Width, "Petal.Width")),
Ratio = purrr::map(data, ~ msd_df_name(.$Petal.Length/.$Petal.Width, "Ratio"))
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Ratio))
答案很明显,使用 names_sep
选项而不是 names_repair
选项。引自 names_sep
下的 nest
帮助菜单:
If a string, the inner and outer names will be used together. In nest(), the names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by names_sep. In unnest(), the new inner names will have the outer names (+ names_sep) automatically stripped. This makes names_sep roughly symmetric between nesting and unnesting.
library(dplyr, warn.conflicts = FALSE)
msd_c <- function(x) c(mn = mean(x), sd = sd(x))
msd_df <- function(x) bind_rows(c(mn = mean(x), sd = sd(x)))
iris %>%
select(Petal.Length:Species) %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(
Petal.Length = purrr::map(data, ~ msd_df(.$Petal.Length)),
Petal.Width = purrr::map(data, ~ msd_df(.$Petal.Width)),
Correlation = purrr::map(data, ~ broom::tidy(cor.test(.$Petal.Length, .$Petal.Width))),
) %>%
select(-data) %>%
tidyr::unnest(c(Petal.Length, Petal.Width, Correlation), names_sep = ".")
#> # A tibble: 3 x 13
#> # Groups: Species [3]
#> Species Petal.Length.mn Petal.Length.sd Petal.Width.mn Petal.Width.sd
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 1.46 0.174 0.246 0.105
#> 2 versic~ 4.26 0.470 1.33 0.198
#> 3 virgin~ 5.55 0.552 2.03 0.275
#> # ... with 8 more variables: Correlation.estimate <dbl>,
#> # Correlation.statistic <dbl>, Correlation.p.value <dbl>,
#> # Correlation.parameter <int>, Correlation.conf.low <dbl>,
#> # Correlation.conf.high <dbl>, Correlation.method <chr>,
#> # Correlation.alternative <chr>
由 reprex package (v0.3.0)
于 2020 年 6 月 10 日创建