dplyr::filter 行的字符使用 lag/lead 匹配多列

dplyr::filter rows with characters using lag/lead matching several columns

我正在尝试使用下一个特定列(col1col2col3)中的内容从数据帧 (df) 中过滤行行。

接近但仅使用一列滞后

大多数显示如何使用 lag/lead 进行过滤的帖子都有数字列,在我的例子中,它们都是文本。

df <- tibble::tribble(
  ~col1,  ~col2,     ~col3,          ~Effect,
  "Jim", "Walk", "optionA",      "col1×col2",
  "Jim", "Walk", "optionA", "col1×col2×col2",
  "Jim",  "Run", "optionB",           "col1",
  "Jim",  "Run", "optionB",      "col1×col2",
  "Jim",  "Run", "optionB", "col1×col2×col2",
  "Joe", "Walk", "optionA",           "col1",
  "Joe", "Walk", "optionA",      "col1×col2",
  "Joe",  "Run", "optionB", "col1×col2×col2"
  )

如果下一行(Effect 列除外)相同,我想过滤行。

最终的数据框看起来像这样

df_result <- tibble::tribble(
  ~col1,  ~col2,     ~col3,          ~Effect,
  "Jim", "Walk", "optionA", "col1×col2×col2",
  "Jim",  "Run", "optionB", "col1×col2×col2",
  "Joe", "Walk", "optionA",      "col1×col2",
  "Joe",  "Run", "optionB", "col1×col2×col2"
  )

有人有什么建议吗?如果可能的话,我想使用 tidyverse 来获得解决方案。

您可以尝试 duplicated 选项 fromLast = TRUE 如下所示

df[!duplicated(df[-4], fromLast = TRUE), ]

一个tidyverse解决方案可以是

library(dplyr)

df %>% 
  group_by(across(-Effect)) %>% 
  slice_tail(n = 1) %>%
  ungroup()

这个returns

# A tibble: 4 x 4
  col1  col2  col3    Effect        
  <chr> <chr> <chr>   <chr>         
1 Jim   Run   optionB col1×col2×col2
2 Jim   Walk  optionA col1×col2×col2
3 Joe   Run   optionB col1×col2×col2
4 Joe   Walk  optionA col1×col2 

我们可以使用distinct

library(dplyr)
df %>%
   slice(rev(row_number())) %>%
  distinct(across(col1:col3), .keep_all = TRUE)

-输出

# A tibble: 4 x 4
  col1  col2  col3    Effect        
  <chr> <chr> <chr>   <chr>         
1 Joe   Run   optionB col1×col2×col2
2 Joe   Walk  optionA col1×col2     
3 Jim   Run   optionB col1×col2×col2
4 Jim   Walk  optionA col1×col2×col2

或使用nchar

df %>%
    group_by(across(col1:col3)) %>%
    slice(which.max(nchar(Effect))) %>% 
    ungroup