dplyr::filter 行的字符使用 lag/lead 匹配多列
dplyr::filter rows with characters using lag/lead matching several columns
我正在尝试使用下一个特定列(col1
、col2
和 col3
)中的内容从数据帧 (df
) 中过滤行行。
接近但仅使用一列滞后
大多数显示如何使用 lag/lead 进行过滤的帖子都有数字列,在我的例子中,它们都是文本。
df <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2",
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1",
"Jim", "Run", "optionB", "col1×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
如果下一行(Effect
列除外)相同,我想过滤行。
最终的数据框看起来像这样
df_result <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
有人有什么建议吗?如果可能的话,我想使用 tidyverse 来获得解决方案。
您可以尝试 duplicated
选项 fromLast = TRUE
如下所示
df[!duplicated(df[-4], fromLast = TRUE), ]
一个tidyverse
解决方案可以是
library(dplyr)
df %>%
group_by(across(-Effect)) %>%
slice_tail(n = 1) %>%
ungroup()
这个returns
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Jim Run optionB col1×col2×col2
2 Jim Walk optionA col1×col2×col2
3 Joe Run optionB col1×col2×col2
4 Joe Walk optionA col1×col2
我们可以使用distinct
library(dplyr)
df %>%
slice(rev(row_number())) %>%
distinct(across(col1:col3), .keep_all = TRUE)
-输出
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Joe Run optionB col1×col2×col2
2 Joe Walk optionA col1×col2
3 Jim Run optionB col1×col2×col2
4 Jim Walk optionA col1×col2×col2
或使用nchar
df %>%
group_by(across(col1:col3)) %>%
slice(which.max(nchar(Effect))) %>%
ungroup
我正在尝试使用下一个特定列(col1
、col2
和 col3
)中的内容从数据帧 (df
) 中过滤行行。
大多数显示如何使用 lag/lead 进行过滤的帖子都有数字列,在我的例子中,它们都是文本。
df <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2",
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1",
"Jim", "Run", "optionB", "col1×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
如果下一行(Effect
列除外)相同,我想过滤行。
最终的数据框看起来像这样
df_result <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
有人有什么建议吗?如果可能的话,我想使用 tidyverse 来获得解决方案。
您可以尝试 duplicated
选项 fromLast = TRUE
如下所示
df[!duplicated(df[-4], fromLast = TRUE), ]
一个tidyverse
解决方案可以是
library(dplyr)
df %>%
group_by(across(-Effect)) %>%
slice_tail(n = 1) %>%
ungroup()
这个returns
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Jim Run optionB col1×col2×col2
2 Jim Walk optionA col1×col2×col2
3 Joe Run optionB col1×col2×col2
4 Joe Walk optionA col1×col2
我们可以使用distinct
library(dplyr)
df %>%
slice(rev(row_number())) %>%
distinct(across(col1:col3), .keep_all = TRUE)
-输出
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Joe Run optionB col1×col2×col2
2 Joe Walk optionA col1×col2
3 Jim Run optionB col1×col2×col2
4 Jim Walk optionA col1×col2×col2
或使用nchar
df %>%
group_by(across(col1:col3)) %>%
slice(which.max(nchar(Effect))) %>%
ungroup