R中的基数排序实现

Radix sort implementation in R

如何在 R 基(例如)中为以下向量实现 Radix sort

vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

总之,基数排序执行以下操作:

等等。当然,vec只是一个例子,解决方案可以处理任意长度的数据,包含任意长度的数字。

输出将 vec 升序(或降序)排序。也就是说,

6   22   25   34  478  504 9021 9947

我知道 data.table 实现了开箱即用的基数排序,因此您可以使用该包,例如,通过简单地设置键对数据进行排序:

library(data.table)

vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

f1<-function(vec){
  DT<-data.table(vec)
setkey(DT, vec)
DT
}

f1(vec)

    vec
1:    6
2:   22
3:   25
4:   34
5:  478
6:  504
7: 9021
8: 9947

我想你可以自己实现算法,但在 R 中它可能会很慢。函数看起来像这样:

library(stringr)
library(dplyr)
library(tidyr)

radix<-function(numbers){
  digits<-nchar(max(numbers))
  numbers<-str_pad(numbers, digits, pad = "0")
  rad<-data.frame(matrix(0, ncol = digits, nrow = length(numbers)))

  for(i in 1:digits){
    rad[,i] <- str_sub(numbers, i,i)
  }

  for(z in rev(1:ncol(rad))){
    a <- which(rad[,z] ==  0 )
    b <- which(rad[,z] ==  1 )
    c <- which(rad[,z] ==  2 ) 
    d <- which(rad[,z] ==  3 )
    e <- which(rad[,z] ==  4 ) 
    f <- which(rad[,z] ==  5 )
    g <- which(rad[,z] ==  6 ) 
    h <- which(rad[,z] ==  7 )
    i <- which(rad[,z] ==  8 ) 
    j <- which(rad[,z] ==  9 )

    k<-c(a,b,c,d,e,f,g,h,i,j)
    rad<-rad[k,]
  }

  rad<-rad %>% unite_(col = "num", from = colnames(rad), sep = "")
  return(as.numeric(rad$num))
}

它可能是 cleaned/speed,但据我了解,这是基数排序:

radix(vec)
[1]    6   22   25   34  478  504 9021 9947

比较速度:

microbenchmark(f1(vec), radix(vec))

Unit: microseconds
      expr    min     lq mean median     uq     max neval
   f1(vec)  290.6  314.8  335    327  349.1   524.1   100
radix(vec) 1062.8 1121.7 1458   1163 1250.5 24407.9   100

更大的速度比较:

set.seed(200)
more<-sample(10000,5000)
microbenchmark(f1(more), radix(more))

       expr     min      lq  mean  median      uq     max neval
   f1(more)   539.3   565.5   623   622.2   664.8   769.7   100
radix(more) 10457.8 10668.0 11683 11133.7 12298.3 25010.6   100

这是我自己的解决方案:

f_radixSort <- function(x){
    mx <- nchar(max(x))
    for (i in 1:mx)
        x <- x[order(x%%(10^i))]
    return(x)
}

样本调用以及逐步排序的打印。

f_radixSort(vec)

# units
# [1] 9021   22   34  504   25    6 9947  478

# tens
# [1]  504    6 9021   22   25   34 9947  478

# hundreds
# [1]    6 9021   22   25   34  478  504 9947

# thousands
# [1]    6   22   25   34  478  504 9021 9947

# ten thousands
# [1]    6   22   25   34  478  504 9021 9947

和一个简短的BENCHMARKING(我没有包括使用data.table的排序因为我不知道它的原理是什么,而且我问了一个答案在基数 R):

library(microbenchmark)
vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

all(radix(vec)==f_radixSort(vec))
# [1] TRUE

microbenchmark(radix(vec), f_radixSort(vec))

# Unit: microseconds
             # expr     min      lq      mean   median       uq      max neval
       # radix(vec) 857.239 915.230 980.39907 943.4745 1005.071 2081.051   100
 # f_radixSort(vec)  39.061  42.216  52.28206  51.0810   54.686  111.775   100

# ========================================================
set.seed(200)
vec<-sample(10000,5000)

all(radix(vec)==f_radixSort(vec))
# [1] TRUE

microbenchmark(radix(vec), f_radixSort(vec))

# Unit: milliseconds
             # expr      min       lq     mean   median       uq       max neval
       # radix(vec) 6.724506 7.003191 8.135387 7.877256 8.195904 52.786763   100
 # f_radixSort(vec) 2.132132 2.167436 2.302167 2.200337 2.268544  4.009464   100

我的解决方案是这样的 - 请耐心等待,我是初学者 ;-) 但结果是正确的:

radixSort <- function(sortvec) {
  mx <- nchar(max(sortvec))
  ## for all digits up to the number of digits in the longest number:  
  for (i in 1:mx){
    ## empty the 10 buckets
    bucket <- list()
    ## for all 10 buckets:
    for (bucketnumber in 1:10){
      ## fill each bucket with the appropriate numbers
      bucket[[bucketnumber]] <- sortvec[dig(sortvec, i)==(bucketnumber-1)]
    }
    ## empty the sorted vector
    sortvec <- c()
    ## fill the sorted vector with the the contents of buckets 1-10
    for (k in 1:10){
      sortvec <- c(sortvec, bucket[[k]])
    }
  }
  return(sortvec)
}

dig <- function(x, st) {
  ## returns the value of digit #st in number x, e.g. dig(3456, 2) returns 5
  remainder <- x%%(10^st)
  divisor <- 10^(st-1)
  return(trunc(remainder/divisor))
}