如何对 R 中的 bigz 整数向量进行排序?

How can I sort a vector of bigz integers in R?

我在 R 工作,在 gmp package 中使用任意精度算法。这个包以 bigz 形式创建和存储大整数。例如,您可以创建任意大整数的向量,如下所示:

X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
               "810367198176345917234", "92573840155289", "729811850143511981", "51385",
               "358934723", "751938475", "72265018270590", "12838756105612376401932875"));

我想对这个大整数向量进行排序(从小到大)。尽管 bigz 对象的文档指出它们可以与不等式操作进行比较,但不幸的是标准 sort 函数对它们不起作用:

sort(X)
Error in rank(x, ties.method = "min", na.last = "keep") : 
  raw vectors cannot be sorted

问题:如何将上面的bigz向量按升序排序?

它涉及强制字符串和返回,但您可以使用 str_sort()。参数 numeric = TRUE 给出自然排序顺序而不是字母数字排序顺序。

library(stringr)
library(gmp)

as.bigz(str_sort(BIGINTEGERS, numeric = TRUE))
Big Integer ('bigz') object of length 11:
 [1] 51385                      358934723                  751938475                  72265018270590             92573840155289             610034193791098           
 [7] 729811850143511981         734876349856913169345      810367198176345917234      12838756105612376401932875 82348779011105371828395319

另一种选择是 mixedsortgtools 转换为 character

as.bigz(gtools::mixedsort(as.character(BIGINTEGERS)))
#Big Integer ('bigz') object of length 11:
# [1] 51385                      358934723                  751938475                 
# [4] 72265018270590             92573840155289             610034193791098           
# [7] 729811850143511981         734876349856913169345      810367198176345917234     
#[10] 12838756105612376401932875 82348779011105371828395319

作为 class 'bigz' 的方法包括 as.character

grep('as.character', methods(class = 'bigz'), fixed = TRUE, value = TRUE)
#[1] "as.character.bigz"

我编写了一个函数来执行此操作,方法是首先按位数对大整数进行分组,然后将每组作为字符向量进行排序。它并不十分优雅,但它确实有效:

library(gmp)
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
               "810367198176345917234", "92573840155289", "729811850143511981", "51385",
               "358934723", "751938475", "72265018270590", "12838756105612376401932875"))

sortbigz <- function(N, decreasing = FALSE) { 
  stopifnot(is.bigz(N))
  # returns a list with the following:
  #  [[1]] a bigz vector, sorted as if NA represented infinity
  #  [[2]] the original argument, converted to a character vector, unsorted
  #  [[3]] integer vector showing the rank of each element of the original vector, in the sorted vector

  z <- is.na(N)
  Ch <- as.character(N)
  is.na(Ch) <- z
  negnumbers <- N < 0
  negnumbers[z] <- FALSE
  str.length <- nchar(Ch)
  n.digits <- ifelse(negnumbers, -(str.length - 1L), str.length)  # number of digits in each element, where for example -582 is deemed to have -3 digits
  r <- rank(n.digits, ties.method = "min")
  upr <- unique(r[!negnumbers]) # unique ranks of positive numbers in N
  unr <- unique(r[negnumbers])  # unique ranks of negative numbers in N
  for(s in upr) r[r == s] <- (s - 1L) + rank(Ch[r == s], ties.method = "min")
  for(s in unr) r[r == s] <- (s + sum(r == s)) - rank(Ch[r == s], ties.method = "random") 
  if(decreasing) r <- (1L + length(N)) - r
  list(sorted.bigz   = N[order(r)], 
       unsorted.char = Ch,
       ranking       = r)
}

sortbigz(X)