如何在 R 中自动化这个简单的条件列操作？

Question

我有一个如下所示的数据框：

tibble(term = c(
  rep("a:b", 2),
  rep("b:a", 2),
  rep("c:d", 2),
  rep("d:c", 2),
  rep("g:h", 2),
  rep("h:g", 2)
))

我想在此数据框中添加一个额外的列，该列对于具有相同字符但反转并由“:”分隔的任何对具有相同的值（即 a:b 和 b:a 将以相同的方式编码；类似于 c:d 和 d:c 以及所有其他对）。

我想到了如下内容：

%>%
  mutate(term_adjusted = case_when(grepl("a:b|b:a", term) ~ "a:b"))

但是我的数据集中有大量这样的对，我想要一种自动化的方法，因此我的问题是：

我怎样才能自动执行此操作，而不必分别对每一对进行硬编码？

谢谢！

Answer 1

怎么样：

libary(dplyr)

your_data %>%
  mutate(term_adjusted = term %>%
                           strsplit(":") %>%
                           purrr::map_chr(~ .x %>%
                                           sort() %>%
                                           paste(collapse = ":")))

基础 R 选项

your_data$term_adjusted <- your_data$term |>
                             strsplit(":") |>
                             lapply(sort) |>
                             lapply(paste, collapse = ":") |>
                             unlist()

或者returns:

# A tibble: 12 x 2
   term  term_adjusted
   <chr> <chr>
 1 a:b   a:b
 2 a:b   a:b
 3 b:a   a:b
 4 b:a   a:b
 5 c:d   c:d
 6 c:d   c:d
 7 d:c   c:d
 8 d:c   c:d
 9 g:h   g:h
10 g:h   g:h
11 h:g   g:h
12 h:g   g:h

Answer 2

tidyverse选项-

library(dplyr)
library(tidyr)

df %>%
  separate(term, c('term1', 'term2'), sep = ':', remove = FALSE) %>%
  mutate(col1 = pmin(term1, term2), col2 = pmax(term1, term2)) %>%
  unite(result, col1, col2, sep = ':') %>%
  select(term, result)

#  term  result
#   <chr> <chr> 
# 1 a:b   a:b   
# 2 a:b   a:b   
# 3 b:a   a:b   
# 4 b:a   a:b   
# 5 c:d   c:d   
# 6 c:d   c:d   
# 7 d:c   c:d   
# 8 d:c   c:d   
# 9 g:h   g:h   
#10 g:h   g:h   
#11 h:g   g:h   
#12 h:g   g:h

如何在 R 中自动化这个简单的条件列操作？

How can I automate this simple conditional column operation in R?

r

conditional-statements

dplyr

grepl

tidyverse