如何在 R 中对混合值进行排序
How to do a sort of mixed values in R
我有一个数据框,我想按一列而不是下一列排序(如果可能,使用 tidyverse)。
我检查了以下地址,但解决方案似乎不起作用。
Order a "mixed" vector (numbers with letters)
示例代码:
variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))
这没有给我我想要的:
df <- df %>% arrange(variable, level)
等级列顺序如下:
variable level
channel DIR
channel EA
channel IA
level 1000
level 500
level 750
我需要它们:
variable level
channel DIR
channel EA
channel IA
level 500
level 750
level 1000
真实数据集中有多个不同的"variables",其中一半需要按数字顺序排序,一半需要按字母顺序排序。有人知道怎么做吗?
有点难看,但您可以使用过滤语句将数据框分成两部分,分别排列每个部分,然后将它们重新绑定在一起:
df <- bind_rows(df %>%
filter(!is.na(as.numeric(level))) %>%
arrange(variable, as.numeric(level)),
df %>%
filter(is.na(as.numeric(level))) %>%
arrange(variable, level))
给你:
# A tibble: 6 x 2
variable level
<chr> <chr>
1 comp_ded 500
2 comp_ded 750
3 comp_ded 1000
4 channel DIR
5 channel EA
6 channel IA
您可以创建一个用于排序的临时变量。按所需顺序排序后,您还可以通过转换为因子来永久设置顺序(如@Vio 的回答)。也许是这样的:
df = df %>%
mutate(tmp = as.numeric(level)) %>%
arrange(variable, tmp, level) %>%
select(-tmp) %>%
mutate(level = factor(level, levels=unique(level)))
variable level
<chr> <fct>
1 channel DIR
2 channel EA
3 channel IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000
我认为您也可以通过不显式创建临时变量来缩短此时间,而是在 arrange
:
中使用 "anonymous" 变量
df = df %>%
arrange(variable, as.numeric(level), level) %>%
mutate(level = factor(level, levels=unique(level)))
转换为因子并改变水平。使用 forcats::fct_relevel()
更容易
# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
mutate(level = as.factor(level))
# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]
df %>% arrange(level)
# A tibble: 6 x 2
variable level
<chr> <fctr>
1 comp_ded DIR
2 comp_ded EA
3 comp_ded IA
4 channel 500
5 channel 750
6 channel 1000
使用gtools
,使用mixedorder
:
的稍微短一点的解决方案
library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]
输出:
variable level
1 channel DIR
2 channel EA
3 channel IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000
最简单的解决方案是使用 dplyr::group_by
。
library(dplyr)
variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))
df %>%
group_by(variable, level) %>%
arrange()
# A tibble: 6 x 2
variable level
<chr> <fctr>
1 comp_ded DIR
2 comp_ded EA
3 comp_ded IA
4 channel 500
5 channel 750
6 channel 1000
我认为首先按 as.numeric(level)
排序,然后按 level
:
排序要容易得多
df %>% arrange(variable, as.numeric(level), level)
给出:
# A tibble: 6 x 2
variable level
<chr> <chr>
1 channel DIR
2 channel EA
3 channel IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000
我有一个数据框,我想按一列而不是下一列排序(如果可能,使用 tidyverse)。
我检查了以下地址,但解决方案似乎不起作用。
Order a "mixed" vector (numbers with letters)
示例代码:
variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))
这没有给我我想要的:
df <- df %>% arrange(variable, level)
等级列顺序如下:
variable level
channel DIR
channel EA
channel IA
level 1000
level 500
level 750
我需要它们:
variable level
channel DIR
channel EA
channel IA
level 500
level 750
level 1000
真实数据集中有多个不同的"variables",其中一半需要按数字顺序排序,一半需要按字母顺序排序。有人知道怎么做吗?
有点难看,但您可以使用过滤语句将数据框分成两部分,分别排列每个部分,然后将它们重新绑定在一起:
df <- bind_rows(df %>%
filter(!is.na(as.numeric(level))) %>%
arrange(variable, as.numeric(level)),
df %>%
filter(is.na(as.numeric(level))) %>%
arrange(variable, level))
给你:
# A tibble: 6 x 2
variable level
<chr> <chr>
1 comp_ded 500
2 comp_ded 750
3 comp_ded 1000
4 channel DIR
5 channel EA
6 channel IA
您可以创建一个用于排序的临时变量。按所需顺序排序后,您还可以通过转换为因子来永久设置顺序(如@Vio 的回答)。也许是这样的:
df = df %>%
mutate(tmp = as.numeric(level)) %>%
arrange(variable, tmp, level) %>%
select(-tmp) %>%
mutate(level = factor(level, levels=unique(level)))
variable level <chr> <fct> 1 channel DIR 2 channel EA 3 channel IA 4 comp_ded 500 5 comp_ded 750 6 comp_ded 1000
我认为您也可以通过不显式创建临时变量来缩短此时间,而是在 arrange
:
df = df %>%
arrange(variable, as.numeric(level), level) %>%
mutate(level = factor(level, levels=unique(level)))
转换为因子并改变水平。使用 forcats::fct_relevel()
# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
mutate(level = as.factor(level))
# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]
df %>% arrange(level)
# A tibble: 6 x 2
variable level
<chr> <fctr>
1 comp_ded DIR
2 comp_ded EA
3 comp_ded IA
4 channel 500
5 channel 750
6 channel 1000
使用gtools
,使用mixedorder
:
library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]
输出:
variable level
1 channel DIR
2 channel EA
3 channel IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000
最简单的解决方案是使用 dplyr::group_by
。
library(dplyr)
variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))
df %>%
group_by(variable, level) %>%
arrange()
# A tibble: 6 x 2
variable level
<chr> <fctr>
1 comp_ded DIR
2 comp_ded EA
3 comp_ded IA
4 channel 500
5 channel 750
6 channel 1000
我认为首先按 as.numeric(level)
排序,然后按 level
:
df %>% arrange(variable, as.numeric(level), level)
给出:
# A tibble: 6 x 2
variable level
<chr> <chr>
1 channel DIR
2 channel EA
3 channel IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000