如何在具有分类变量的数据框的一行内进行排序？

Question

我有这个代码：

test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))

claim1	claim2	claim3	claim4	time1	time2	time3	time4
Derivative	Derivative	Class	Class	2	8	1	10
Derivative	Derivative	Class	Class	5	4	3	9

我正在寻找排序并在以下输出中获取它：

claim1	claim2	claim3	claim4	time1	time2	time3	time4
Class	Derivative	Derivative	Class	1	2	8	10
Class	Derivative	Derivative	Class	3	4	5	9

我正在尝试对一行进行排序，但我不确定如何 link 将声明和时间放在一起。我猜字典在这里不起作用，因为它是一个数组。

Answer 1

对于长数据，这肯定容易得多，因此，至少在 dplyr 中，必须 pivot_longer 然后 pivot_wider 返回：

library(dplyr)
library(tidyr)

test %>% 
  pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>% 
  mutate(group = cumsum(col == 1)) %>% 
  arrange(group, Time, .by_group = T) %>% 
  mutate(col = sequence(rle(group)$l)) %>% 
  pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>% 
  select(-group)

  ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
  <chr>      <chr>      <chr>      <chr>      <dbl> <dbl> <dbl> <dbl>
1 Class      Derivative Derivative Class          1     2     8    10
2 Class      Derivative Derivative Class          3     4     5     9

Answer 2

由于您希望切断基于列的关系，我建议使用拆分-应用-组合类型的工作流。这个想法是将数据框分成更小的部分，以您想要的方式对每个部分进行操作，然后将它们粘在一起。

使用 base R 和一些非常不优雅的代码来展示这个想法：

helper_function <- function(x){
  time_rank <- order(as.numeric(x[5:8]))
  c(x[time_rank], x[time_rank + 4])
}

as.data.frame(t(apply(test, 1, helper_function)))

##      V1         V2         V3    V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class  1  2  8 10
## 2 Class Derivative Derivative Class  3  4  5  9

关键思想是使用 order() 写下您希望每行排列的方式；然后，您可以将该排列应用于每一行的多个部分。

现在，我们应该清理它，因为我们已经破坏了列名和类型：

test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
                           "test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)

test_output

##   claim1     claim2     claim3 claim4 test1 test2 test3 test4
## 1  Class Derivative Derivative  Class     1     2     8    10
## 2  Class Derivative Derivative  Class     3     4     5     9

str(test_output)

我会提到，像我多次那样引用静态列号（例如 5:8）并不是很好的做法，但希望这传达了一种可能的方法。

如何在具有分类变量的数据框的一行内进行排序？

How to sort within a row of a data frame with categorical variables?

arrays

sorting

tablesorter

r

dependent-type