按列的无序组合对 tibble 行进行分组

Question

鉴于以下标题

tibble(sample = c(1:6),
       string = c("ABC","ABC","CBA","FED","DEF","DEF"),
       x = c("a","a","b","e","d","d"),
       y = c("b","b","a","d","e","e"))

# A tibble: 6 × 4
  sample string x     y    
   <int> <chr>  <chr> <chr>
1      1 ABC    a     b    
2      2 ABC    a     b    
3      3 CBA    b     a    
4      4 FED    e     d    
5      5 DEF    d     e    
6      6 DEF    d     e

我想按列 x,y 的无序组合对行进行分组，然后在 x,y 的情况下翻转 x ⇔ y 并反转 string ] 相对于组中的第一行倒置。期望的输出：

# A tibble: 6 × 5
  sample string x     y     group
   <int> <chr>  <chr> <chr> <dbl>
1      1 ABC    a     b         1
2      2 ABC    a     b         1
3      3 ABC    a     b         1
4      4 FED    e     d         2
5      5 FED    e     d         2
6      6 FED    e     d         2

Answer 1

strSort <- function(x) sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

dat %>% 
  group_by(group = data.table::rleid(strSort(string))) %>% 
  mutate(across(string:y, first))

# A tibble: 6 x 5
# Groups:   group [2]
  sample string x     y     group
   <int> <chr>  <chr> <chr> <int>
1      1 ABC    a     b         1
2      2 ABC    a     b         1
3      3 ABC    a     b         1
4      4 FED    e     d         2
5      5 FED    e     d         2
6      6 FED    e     d         2

上一个回答

这是一种同时使用 tidyverse 和 apply 方法的方法。首先，对 x 和 y 列的行进行排序，然后 group_by x 和 y，必要时创建 cur_group_id 和 stri_reverse。

library(tidyverse)
library(stringi)

#Sort by row
dat[, c("x", "y")] <- t(apply(dat[, c("x", "y")], 1, sort))

dat %>% 
  group_by(x, y) %>% 
  mutate(group = cur_group_id(),
         string = ifelse(str_sub(string, 1, 1) == toupper(x), string, stri_reverse(string)))

# A tibble: 6 x 5
# Groups:   x, y [2]
  sample string x     y     group
   <int> <chr>  <chr> <chr> <int>
1      1 ABC    a     b         1
2      2 ABC    a     b         1
3      3 ABC    a     b         1
4      4 DEF    d     e         2
5      5 DEF    d     e         2
6      6 DEF    d     e         2

按列的无序组合对 tibble 行进行分组

group tibble rows by unordered combination of columns

r

dplyr

tidyverse

tibble