如何订购具有优先布局的矢量?
How to order vectors with priority layout?
让我们考虑以下字符串向量:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
如您所见,此向量中的某些字符串开头相同,例如"B", "B_big"
.
我最终想要得到的是一个向量,其布局是这样的,所有具有相同开头的字符串都应该彼此相邻。但是字母的顺序应该保持不变("B"
应该是第一个,"C"
第二个等等)。让我举个例子来说明一下:
简单来说,我想以矢量结尾:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
我为实现此矢量所做的工作:我从左侧读取并看到 "B"
所以我正在查看所有其他以相同开头并将其放在 [= 右侧的矢量12=]。然后是 "C"
,所以我正在查看所有剩余的字符串并将所有字符串都以 "C"
开头,例如"C_small"
向右等等。
我不知道该怎么做。我几乎可以肯定 gsub
函数可以用来处理这个结果,但是我不确定如何将它与这个搜索和替换结合起来。你能帮我一下吗?
这是一种选择:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
使用“前缀”作为因子水平然后排序:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
如果您对非base
替代方案持开放态度,可以使用data.table::chgroup
,“将重复值组合在一起但保留组顺序(根据每个组的第一个出现顺序),高效":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
我建议将文本的两个部分分成不同的维度。然后,使用命名字符向量为名称的描述性部分定义明确的排序顺序。从那里你可以动态地重新排序输入向量。捆绑为功能:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
结果
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"
让我们考虑以下字符串向量:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
如您所见,此向量中的某些字符串开头相同,例如"B", "B_big"
.
我最终想要得到的是一个向量,其布局是这样的,所有具有相同开头的字符串都应该彼此相邻。但是字母的顺序应该保持不变("B"
应该是第一个,"C"
第二个等等)。让我举个例子来说明一下:
简单来说,我想以矢量结尾:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
我为实现此矢量所做的工作:我从左侧读取并看到 "B"
所以我正在查看所有其他以相同开头并将其放在 [= 右侧的矢量12=]。然后是 "C"
,所以我正在查看所有剩余的字符串并将所有字符串都以 "C"
开头,例如"C_small"
向右等等。
我不知道该怎么做。我几乎可以肯定 gsub
函数可以用来处理这个结果,但是我不确定如何将它与这个搜索和替换结合起来。你能帮我一下吗?
这是一种选择:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
使用“前缀”作为因子水平然后排序:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
如果您对非base
替代方案持开放态度,可以使用data.table::chgroup
,“将重复值组合在一起但保留组顺序(根据每个组的第一个出现顺序),高效":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
我建议将文本的两个部分分成不同的维度。然后,使用命名字符向量为名称的描述性部分定义明确的排序顺序。从那里你可以动态地重新排序输入向量。捆绑为功能:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
结果
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"