稍后在管道中访问结果

Question

稍后在管道中访问结果

我正在尝试创建函数，在管道中的每一步打印数据集中排除的行数。

像这样：

iris %>% 
    function_which_save_nrows_and_return_the_data() %>% 
    filter(exclude some rows) %>% 
    function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>% 
    function_which_save_nrows_and_return_the_data() %>% 
    function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data  ...etc

这些是我尝试过的功能：

n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}

n_excluded = function(x) { 
    print(rows - nrow(x))
    return(x)
}

这成功保存了对象行：

但是如果我再添加两个链接，则对象不会被保存：

那么如何在管道之后创建和访问行对象？

Answer 1

这是由于 R 的惰性求值。即使不使用管道也会发生。请参阅下面的代码。在该代码中，n_excluded 的参数是 filter(n_before(iris), Species != 'setosa')，并且在 print 语句中使用 rows 时，尚未从 n_excluded 中引用该参数所以整个论点都不会被评估，所以 rows 还不存在。

if (exists("rows")) rm(rows)  # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) : 
##   error in evaluating the argument 'x' in selecting a method for function 
##   'print': object 'rows' not found

解决这个问题

1) 我们可以在 print 语句之前强制 x。

n_excluded = function(x) { 
  force(x)
  print(rows - nrow(x))
  return(x)
}

2) 或者，我们可以使用 magrittr 顺序管道，它保证腿是运行的顺序。 magrittr 使它可用，但没有为它提供运算符，但我们可以将它分配给这样的运算符。

`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
  n_before() %>%
  filter(Species != 'setosa') %s>%  # note use of %s>% on this line
  n_excluded()

magrittr 开发人员已声明，如果有足够的需求，他会将其添加为运算符，因此您可能希望将此类请求添加到 github 上的 magrittr 问题 #247。

Answer 2

您还可以使用 pipeR 的扩展功能。

library(dplyr)
library(pipeR)
  
n_excluded = function(x) { 
  print(rows - nrow(x))
  return(x)
}

p <- iris %>>%
   (~rows=nrow(.)) %>>%
   filter(Species != "setosa") %>>%
   n_excluded()

稍后在管道中访问结果

Access result later in pipe

r

magrittr

稍后在管道中访问结果