如何对 R 中的 bigz 整数向量进行排序?
How can I sort a vector of bigz integers in R?
我在 R
工作,在 gmp
package 中使用任意精度算法。这个包以 bigz
形式创建和存储大整数。例如,您可以创建任意大整数的向量,如下所示:
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
"810367198176345917234", "92573840155289", "729811850143511981", "51385",
"358934723", "751938475", "72265018270590", "12838756105612376401932875"));
我想对这个大整数向量进行排序(从小到大)。尽管 bigz
对象的文档指出它们可以与不等式操作进行比较,但不幸的是标准 sort
函数对它们不起作用:
sort(X)
Error in rank(x, ties.method = "min", na.last = "keep") :
raw vectors cannot be sorted
问题:如何将上面的bigz
向量按升序排序?
它涉及强制字符串和返回,但您可以使用 str_sort()
。参数 numeric = TRUE
给出自然排序顺序而不是字母数字排序顺序。
library(stringr)
library(gmp)
as.bigz(str_sort(BIGINTEGERS, numeric = TRUE))
Big Integer ('bigz') object of length 11:
[1] 51385 358934723 751938475 72265018270590 92573840155289 610034193791098
[7] 729811850143511981 734876349856913169345 810367198176345917234 12838756105612376401932875 82348779011105371828395319
另一种选择是 mixedsort
从 gtools
转换为 character
as.bigz(gtools::mixedsort(as.character(BIGINTEGERS)))
#Big Integer ('bigz') object of length 11:
# [1] 51385 358934723 751938475
# [4] 72265018270590 92573840155289 610034193791098
# [7] 729811850143511981 734876349856913169345 810367198176345917234
#[10] 12838756105612376401932875 82348779011105371828395319
作为 class 'bigz' 的方法包括 as.character
grep('as.character', methods(class = 'bigz'), fixed = TRUE, value = TRUE)
#[1] "as.character.bigz"
我编写了一个函数来执行此操作,方法是首先按位数对大整数进行分组,然后将每组作为字符向量进行排序。它并不十分优雅,但它确实有效:
library(gmp)
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
"810367198176345917234", "92573840155289", "729811850143511981", "51385",
"358934723", "751938475", "72265018270590", "12838756105612376401932875"))
sortbigz <- function(N, decreasing = FALSE) {
stopifnot(is.bigz(N))
# returns a list with the following:
# [[1]] a bigz vector, sorted as if NA represented infinity
# [[2]] the original argument, converted to a character vector, unsorted
# [[3]] integer vector showing the rank of each element of the original vector, in the sorted vector
z <- is.na(N)
Ch <- as.character(N)
is.na(Ch) <- z
negnumbers <- N < 0
negnumbers[z] <- FALSE
str.length <- nchar(Ch)
n.digits <- ifelse(negnumbers, -(str.length - 1L), str.length) # number of digits in each element, where for example -582 is deemed to have -3 digits
r <- rank(n.digits, ties.method = "min")
upr <- unique(r[!negnumbers]) # unique ranks of positive numbers in N
unr <- unique(r[negnumbers]) # unique ranks of negative numbers in N
for(s in upr) r[r == s] <- (s - 1L) + rank(Ch[r == s], ties.method = "min")
for(s in unr) r[r == s] <- (s + sum(r == s)) - rank(Ch[r == s], ties.method = "random")
if(decreasing) r <- (1L + length(N)) - r
list(sorted.bigz = N[order(r)],
unsorted.char = Ch,
ranking = r)
}
sortbigz(X)
我在 R
工作,在 gmp
package 中使用任意精度算法。这个包以 bigz
形式创建和存储大整数。例如,您可以创建任意大整数的向量,如下所示:
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
"810367198176345917234", "92573840155289", "729811850143511981", "51385",
"358934723", "751938475", "72265018270590", "12838756105612376401932875"));
我想对这个大整数向量进行排序(从小到大)。尽管 bigz
对象的文档指出它们可以与不等式操作进行比较,但不幸的是标准 sort
函数对它们不起作用:
sort(X)
Error in rank(x, ties.method = "min", na.last = "keep") :
raw vectors cannot be sorted
问题:如何将上面的bigz
向量按升序排序?
它涉及强制字符串和返回,但您可以使用 str_sort()
。参数 numeric = TRUE
给出自然排序顺序而不是字母数字排序顺序。
library(stringr)
library(gmp)
as.bigz(str_sort(BIGINTEGERS, numeric = TRUE))
Big Integer ('bigz') object of length 11:
[1] 51385 358934723 751938475 72265018270590 92573840155289 610034193791098
[7] 729811850143511981 734876349856913169345 810367198176345917234 12838756105612376401932875 82348779011105371828395319
另一种选择是 mixedsort
从 gtools
转换为 character
as.bigz(gtools::mixedsort(as.character(BIGINTEGERS)))
#Big Integer ('bigz') object of length 11:
# [1] 51385 358934723 751938475
# [4] 72265018270590 92573840155289 610034193791098
# [7] 729811850143511981 734876349856913169345 810367198176345917234
#[10] 12838756105612376401932875 82348779011105371828395319
作为 class 'bigz' 的方法包括 as.character
grep('as.character', methods(class = 'bigz'), fixed = TRUE, value = TRUE)
#[1] "as.character.bigz"
我编写了一个函数来执行此操作,方法是首先按位数对大整数进行分组,然后将每组作为字符向量进行排序。它并不十分优雅,但它确实有效:
library(gmp)
X <- as.bigz(c("734876349856913169345", "610034193791098", "82348779011105371828395319",
"810367198176345917234", "92573840155289", "729811850143511981", "51385",
"358934723", "751938475", "72265018270590", "12838756105612376401932875"))
sortbigz <- function(N, decreasing = FALSE) {
stopifnot(is.bigz(N))
# returns a list with the following:
# [[1]] a bigz vector, sorted as if NA represented infinity
# [[2]] the original argument, converted to a character vector, unsorted
# [[3]] integer vector showing the rank of each element of the original vector, in the sorted vector
z <- is.na(N)
Ch <- as.character(N)
is.na(Ch) <- z
negnumbers <- N < 0
negnumbers[z] <- FALSE
str.length <- nchar(Ch)
n.digits <- ifelse(negnumbers, -(str.length - 1L), str.length) # number of digits in each element, where for example -582 is deemed to have -3 digits
r <- rank(n.digits, ties.method = "min")
upr <- unique(r[!negnumbers]) # unique ranks of positive numbers in N
unr <- unique(r[negnumbers]) # unique ranks of negative numbers in N
for(s in upr) r[r == s] <- (s - 1L) + rank(Ch[r == s], ties.method = "min")
for(s in unr) r[r == s] <- (s + sum(r == s)) - rank(Ch[r == s], ties.method = "random")
if(decreasing) r <- (1L + length(N)) - r
list(sorted.bigz = N[order(r)],
unsorted.char = Ch,
ranking = r)
}
sortbigz(X)