如何在具有分类变量的数据框的一行内进行排序?

How to sort within a row of a data frame with categorical variables?

我有这个代码:

test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))
claim1 claim2 claim3 claim4 time1 time2 time3 time4
Derivative Derivative Class Class 2 8 1 10
Derivative Derivative Class Class 5 4 3 9

我正在寻找排序并在以下输出中获取它:

claim1 claim2 claim3 claim4 time1 time2 time3 time4
Class Derivative Derivative Class 1 2 8 10
Class Derivative Derivative Class 3 4 5 9

我正在尝试对一行进行排序,但我不确定如何 link 将声明和时间放在一起。我猜字典在这里不起作用,因为它是一个数组。

对于长数据,这肯定容易得多,因此,至少在 dplyr 中,必须 pivot_longer 然后 pivot_wider 返回:

library(dplyr)
library(tidyr)

test %>% 
  pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>% 
  mutate(group = cumsum(col == 1)) %>% 
  arrange(group, Time, .by_group = T) %>% 
  mutate(col = sequence(rle(group)$l)) %>% 
  pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>% 
  select(-group)

  ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
  <chr>      <chr>      <chr>      <chr>      <dbl> <dbl> <dbl> <dbl>
1 Class      Derivative Derivative Class          1     2     8    10
2 Class      Derivative Derivative Class          3     4     5     9

由于您希望切断基于列的关系,我建议使用拆分-应用-组合类型的工作流。这个想法是将数据框分成更小的部分,以您想要的方式对每个部分进行操作,然后将它们粘在一起。

使用 base R 和一些 非常 不优雅的代码来展示这个想法:

helper_function <- function(x){
  time_rank <- order(as.numeric(x[5:8]))
  c(x[time_rank], x[time_rank + 4])
}

as.data.frame(t(apply(test, 1, helper_function)))

##      V1         V2         V3    V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class  1  2  8 10
## 2 Class Derivative Derivative Class  3  4  5  9

关键思想是使用 order() 写下您希望每行排列的方式;然后,您可以将该排列应用于每一行的多个部分。

现在,我们应该清理它,因为我们已经破坏了列名和类型:

test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
                           "test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)

test_output

##   claim1     claim2     claim3 claim4 test1 test2 test3 test4
## 1  Class Derivative Derivative  Class     1     2     8    10
## 2  Class Derivative Derivative  Class     3     4     5     9

str(test_output)

我会提到,像我多次那样引用静态列号(例如 5:8)并不是很好的做法,但希望这传达了一种可能的方法。