是否可以使用 magrittr 在单个工作流程中创建两个数据框?
Is is possible to create two data frames in a single work flow with magrittr?
开始使用 magrittr pipe operators and was curious if two data frames could be created in a single flow. For example, it would be helpful to produce a non-aggregated data frame for plotting and an aggregated data frame to order factors (aggregate ordering example)。
这是一个相当人为的例子,它说明了这个问题:
library(dplyr)
library(tidyr)
library(magrittr)
library(ggplot2) # msleep
vore_count <-
na.exclude(msleep) %>%
group_by(vore, order) %>%
summarise(count = n()) %>%
ungroup()
agg <- vore_count %>%
spread(vore, count)
vore_count
和agg
是否可以在同一个流程中生成?
我尝试了以下方法(以及使用 %T>%),但显然行不通。
vore_count <-
na.exclude(msleep) %>%
group_by(vore, order) %>%
summarise(count = n()) %>%
ungroup() %>%
agg <- spread(vore, count)
您可以在管道中使用 list()
,然后在计算第一个 data.frame 后连接 agg
。这里我直接用mtcars
。结果是两个数据框的命名列表。
library(dplyr)
library(tidyr)
na.exclude(mtcars) %>%
group_by(cyl, disp) %>%
summarise(count = n()) %>%
ungroup %>%
list(cyl_count = .) %>%
c(list(agg = spread(.$cyl_count, cyl, count)))
如果你想将这些分配给全局环境,你可以在管道的末尾添加以下行
... %>%
list2env(globalenv())
ls(pattern = "agg|cyl_count")
# [1] "agg" "cyl_count"
pipeR
.
的边赋值更容易
library(pipeR)
library(dplyr)
library(ggplot2)
library(tidyr)
na.exclude(msleep) %>>%
group_by(vore, order) %>>%
summarise(count = n()) %>>%
ungroup() %>>%
(~ vore_count) %>>%
spread(vore, count)%>>%
(~ agg)
虽然我能理解这种诱惑,但 IMO 只应从一个 workflow/pipeline 中分配一个作业。它更简洁、更易于阅读,并且更好地练习。理想情况下,每个管道都应该只有一个目的。一种输入,一种输出。
开始使用 magrittr pipe operators and was curious if two data frames could be created in a single flow. For example, it would be helpful to produce a non-aggregated data frame for plotting and an aggregated data frame to order factors (aggregate ordering example)。
这是一个相当人为的例子,它说明了这个问题:
library(dplyr)
library(tidyr)
library(magrittr)
library(ggplot2) # msleep
vore_count <-
na.exclude(msleep) %>%
group_by(vore, order) %>%
summarise(count = n()) %>%
ungroup()
agg <- vore_count %>%
spread(vore, count)
vore_count
和agg
是否可以在同一个流程中生成?
我尝试了以下方法(以及使用 %T>%),但显然行不通。
vore_count <-
na.exclude(msleep) %>%
group_by(vore, order) %>%
summarise(count = n()) %>%
ungroup() %>%
agg <- spread(vore, count)
您可以在管道中使用 list()
,然后在计算第一个 data.frame 后连接 agg
。这里我直接用mtcars
。结果是两个数据框的命名列表。
library(dplyr)
library(tidyr)
na.exclude(mtcars) %>%
group_by(cyl, disp) %>%
summarise(count = n()) %>%
ungroup %>%
list(cyl_count = .) %>%
c(list(agg = spread(.$cyl_count, cyl, count)))
如果你想将这些分配给全局环境,你可以在管道的末尾添加以下行
... %>%
list2env(globalenv())
ls(pattern = "agg|cyl_count")
# [1] "agg" "cyl_count"
pipeR
.
library(pipeR)
library(dplyr)
library(ggplot2)
library(tidyr)
na.exclude(msleep) %>>%
group_by(vore, order) %>>%
summarise(count = n()) %>>%
ungroup() %>>%
(~ vore_count) %>>%
spread(vore, count)%>>%
(~ agg)
虽然我能理解这种诱惑,但 IMO 只应从一个 workflow/pipeline 中分配一个作业。它更简洁、更易于阅读,并且更好地练习。理想情况下,每个管道都应该只有一个目的。一种输入,一种输出。