如何在具有分类变量的数据框的一行内进行排序?
How to sort within a row of a data frame with categorical variables?
我有这个代码:
test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))
claim1
claim2
claim3
claim4
time1
time2
time3
time4
Derivative
Derivative
Class
Class
2
8
1
10
Derivative
Derivative
Class
Class
5
4
3
9
我正在寻找排序并在以下输出中获取它:
claim1
claim2
claim3
claim4
time1
time2
time3
time4
Class
Derivative
Derivative
Class
1
2
8
10
Class
Derivative
Derivative
Class
3
4
5
9
我正在尝试对一行进行排序,但我不确定如何 link 将声明和时间放在一起。我猜字典在这里不起作用,因为它是一个数组。
对于长数据,这肯定容易得多,因此,至少在 dplyr
中,必须 pivot_longer 然后 pivot_wider 返回:
library(dplyr)
library(tidyr)
test %>%
pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>%
mutate(group = cumsum(col == 1)) %>%
arrange(group, Time, .by_group = T) %>%
mutate(col = sequence(rle(group)$l)) %>%
pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>%
select(-group)
ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Class Derivative Derivative Class 1 2 8 10
2 Class Derivative Derivative Class 3 4 5 9
由于您希望切断基于列的关系,我建议使用拆分-应用-组合类型的工作流。这个想法是将数据框分成更小的部分,以您想要的方式对每个部分进行操作,然后将它们粘在一起。
使用 base
R 和一些 非常 不优雅的代码来展示这个想法:
helper_function <- function(x){
time_rank <- order(as.numeric(x[5:8]))
c(x[time_rank], x[time_rank + 4])
}
as.data.frame(t(apply(test, 1, helper_function)))
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class 1 2 8 10
## 2 Class Derivative Derivative Class 3 4 5 9
关键思想是使用 order()
写下您希望每行排列的方式;然后,您可以将该排列应用于每一行的多个部分。
现在,我们应该清理它,因为我们已经破坏了列名和类型:
test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
"test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)
test_output
## claim1 claim2 claim3 claim4 test1 test2 test3 test4
## 1 Class Derivative Derivative Class 1 2 8 10
## 2 Class Derivative Derivative Class 3 4 5 9
str(test_output)
我会提到,像我多次那样引用静态列号(例如 5:8
)并不是很好的做法,但希望这传达了一种可能的方法。
我有这个代码:
test <- data.frame("ClaimType1" = "Derivative", "ClaimType2" = "Derivative","ClaimType3" = "Class", "ClaimType4" = "Class", "Time1" = c(2,5), "Time2" = c(8,4), "Time3" = c(1,3), "Time4" = c(10,9))
claim1 | claim2 | claim3 | claim4 | time1 | time2 | time3 | time4 |
---|---|---|---|---|---|---|---|
Derivative | Derivative | Class | Class | 2 | 8 | 1 | 10 |
Derivative | Derivative | Class | Class | 5 | 4 | 3 | 9 |
我正在寻找排序并在以下输出中获取它:
claim1 | claim2 | claim3 | claim4 | time1 | time2 | time3 | time4 |
---|---|---|---|---|---|---|---|
Class | Derivative | Derivative | Class | 1 | 2 | 8 | 10 |
Class | Derivative | Derivative | Class | 3 | 4 | 5 | 9 |
我正在尝试对一行进行排序,但我不确定如何 link 将声明和时间放在一起。我猜字典在这里不起作用,因为它是一个数组。
对于长数据,这肯定容易得多,因此,至少在 dplyr
中,必须 pivot_longer 然后 pivot_wider 返回:
library(dplyr)
library(tidyr)
test %>%
pivot_longer(cols = everything(), names_to = c(".value","col"), names_pattern = "(ClaimType|Time)(.*)") %>%
mutate(group = cumsum(col == 1)) %>%
arrange(group, Time, .by_group = T) %>%
mutate(col = sequence(rle(group)$l)) %>%
pivot_wider(id_cols = group, names_from = col, values_from = c("ClaimType","Time"), names_sep = "") %>%
select(-group)
ClaimType1 ClaimType2 ClaimType3 ClaimType4 Time1 Time2 Time3 Time4
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Class Derivative Derivative Class 1 2 8 10
2 Class Derivative Derivative Class 3 4 5 9
由于您希望切断基于列的关系,我建议使用拆分-应用-组合类型的工作流。这个想法是将数据框分成更小的部分,以您想要的方式对每个部分进行操作,然后将它们粘在一起。
使用 base
R 和一些 非常 不优雅的代码来展示这个想法:
helper_function <- function(x){
time_rank <- order(as.numeric(x[5:8]))
c(x[time_rank], x[time_rank + 4])
}
as.data.frame(t(apply(test, 1, helper_function)))
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 Class Derivative Derivative Class 1 2 8 10
## 2 Class Derivative Derivative Class 3 4 5 9
关键思想是使用 order()
写下您希望每行排列的方式;然后,您可以将该排列应用于每一行的多个部分。
现在,我们应该清理它,因为我们已经破坏了列名和类型:
test_output <- as.data.frame(t(apply(test, 1, helper_function)))
colnames(test_output) <- c("claim1", "claim2", "claim3", "claim4",
"test1", "test2", "test3", "test4")
test_output[5:8] <- apply(test_output[, 5:8], 2, as.numeric)
test_output
## claim1 claim2 claim3 claim4 test1 test2 test3 test4
## 1 Class Derivative Derivative Class 1 2 8 10
## 2 Class Derivative Derivative Class 3 4 5 9
str(test_output)
我会提到,像我多次那样引用静态列号(例如 5:8
)并不是很好的做法,但希望这传达了一种可能的方法。