如何订购具有优先布局的矢量?

How to order vectors with priority layout?

让我们考虑以下字符串向量:

x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")

如您所见,此向量中的某些字符串开头相同,例如"B", "B_big".

我最终想要得到的是一个向量,其布局是这样的,所有具有相同开头的字符串都应该彼此相邻。但是字母的顺序应该保持不变("B" 应该是第一个,"C" 第二个等等)。让我举个例子来说明一下:

简单来说,我想以矢量结尾:

"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"

我为实现此矢量所做的工作:我从左侧读取并看到 "B" 所以我正在查看所有其他以相同开头并将其放在 [= 右侧的矢量12=]。然后是 "C",所以我正在查看所有剩余的字符串并将所有字符串都以 "C" 开头,例如"C_small"向右等等。

我不知道该怎么做。我几乎可以肯定 gsub 函数可以用来处理这个结果,但是我不确定如何将它与这个搜索和替换结合起来。你能帮我一下吗?

这是一种选择:

x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")

xorder <- unique(substr(x, 1, 1))
xnew <- c()

for (letter in xorder) {
  if (letter %in% substr(x, 1, 1)) {
    xnew <- c(xnew, x[substr(x, 1, 1) == letter])
  }
}

xnew

[1] "B"            "B_big"        "B_tremendous" "C_small"      "C"           
[6] "A"            "A_huge"       "A_big"        "D"   

使用“前缀”作为因子水平然后排序:

sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B"   "B_big"   "B_tremendous"  "C_small"   "C"   "A"   "A_huge"   "A_big"   "D"

如果您对非base替代方案持开放态度,可以使用data.table::chgroup,“将重复值组合在一起但保留组顺序(根据每个组的第一个出现顺序),高效":

x[chgroup(substr(x, 1, 1))] 
# [1] "B"   "B_big"   "B_tremendous"  "C_small"   "C"   "A"   "A_huge"   "A_big"   "D"

我建议将文本的两个部分分成不同的维度。然后,使用命名字符向量为名称的描述性部分定义明确的排序顺序。从那里你可以动态地重新排序输入向量。捆绑为功能:

x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")

sorter <- function(x) {
    # separate the two parts
    prefix <- sub("_.*$", "", x)
    suffix <- sub("^.*_", "", x)
    # identify inputs with no suffix
    suffix <- ifelse(suffix == "", "none", suffix)
    
    # map each suffix to a rank ordering 
    suffix_order <- c(
        "small"      = -1,
        "none"       =  0, 
        "big"        =  1,
        "huge"       =  2,
        "tremendous" =  3
    )
    
    # return input vector, 
    # ordered by the prefix and the mapping of suffix to rank
    x[order(prefix, suffix_order[suffix])]
    
}

sorter(x)

结果

[1] "A_big" "A_huge" "A"  "B_big" "B_tremendous" "B" "C_small" "C"           
[9] "D"