转换列名，以便它们可以按数字顺序排列

Question

我正在尝试通过创建适用于 new_dat 和 old_dat 的解决方案来扩展。

新数据

new_dat <- structure(list(`[0,25) east` = c(1269L, 85L), `[0,25) north` = c(364L, 
21L), `[0,25) south` = c(1172L, 97L), `[0,25) west` = c(549L, 
49L), `[100,250) east` = c(441L, 149L), `[100,250) north` = c(224L, 
45L), `[100,250) south` = c(521L, 247L), `[100,250) west` = c(770L, 
124L), `[100,500) east` = c(0L, 0L), `[100,500) north` = c(0L, 
0L), `[100,500) south` = c(0L, 0L), `[100,500) west` = c(0L, 
0L), `[1000,1000000] east` = c(53L, 0L), `[1000,1000000] north` = c(82L, 
0L), `[1000,1000000] south` = c(23L, 0L), `[1000,1000000] west` = c(63L, 
0L), `[1000,1500) east` = c(0L, 0L), `[1000,1500) north` = c(0L, 
0L), `[1000,1500) south` = c(0L, 0L), `[1000,1500) west` = c(0L, 
0L), `[1500,3000) east` = c(0L, 0L), `[1500,3000) north` = c(0L, 
0L), `[1500,3000) south` = c(0L, 0L), `[1500,3000) west` = c(0L, 
0L), `[25,100) east` = c(579L, 220L), `[25,100) north` = c(406L, 
58L), `[25,100) south` = c(1048L, 316L), `[25,100) west` = c(764L, 
131L), `[25,50) east` = c(0L, 0L), `[25,50) north` = c(0L, 0L
), `[25,50) south` = c(0L, 0L), `[25,50) west` = c(0L, 0L), `[250,500) east` = c(232L, 
172L), `[250,500) north` = c(207L, 40L), `[250,500) south` = c(202L, 
148L), `[250,500) west` = c(457L, 153L), `[3000,1000000] east` = c(0L, 
0L), `[3000,1000000] north` = c(0L, 0L), `[3000,1000000] south` = c(0L, 
0L), `[3000,1000000] west` = c(0L, 0L), `[50,100) east` = c(0L, 
0L), `[50,100) north` = c(0L, 0L), `[50,100) south` = c(0L, 0L
), `[50,100) west` = c(0L, 0L), `[500,1000) east` = c(103L, 0L
), `[500,1000) north` = c(185L, 0L), `[500,1000) south` = c(66L, 
0L), `[500,1000) west` = c(200L, 0L), `[500,1000000] east` = c(0L, 
288L), `[500,1000000] north` = c(0L, 120L), `[500,1000000] south` = c(0L, 
229L), `[500,1000000] west` = c(0L, 175L)), row.names = c("Andere akkerbouwbedrijven", 
"Andere combinatiebedrijven"), class = "data.frame")

旧数据和原始解决方案

old_dat <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L, 
    `[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L, 
    `[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L, 
    `[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")

该解决方案利用了这样一个事实，即添加的每个列名称中的两个数字之和提供了正确的顺序。

ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>% 
         strsplit(",") %>% 
         lapply(as.numeric) %>% 
         lapply(sum) %>% 
         unlist %>% 
         order()

colnames(dat)[ord]

新方法

新数据不仅要有数值，还要有字符串值(east, north, south, west)。我意识到，如果我给 east 一个值 1、north 或 2 等等，我可以使用相同的解决方案。三个数字的总和仍然提供正确的顺序。

虽然我在调整代码时遇到了一些问题。

ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>% 
         # provides "0,25 east", "0,25 north" etc

         strsplit(",") %>% 
         # provides "0" and "25 east", "0" and "25 north" etc

         lapply(as.numeric) %>% 
         lapply(sum) %>% 
         # SHOULD provide 0+25+1 (east), 0+25+2 (north) etc

         unlist %>% 
         order()

问题在于将字符串拆分为 3 个部分，并将方向转换为数字、IF 和 ONLY IF，共有三个部分。否则它应该只使用两者。我应该怎么做？

Answer 1

也许有点矫枉过正，但有了这个，您不需要找到“东”、“南”等模式。

library(magrittr)
order_cols <- function(dat) {
  
  # look for words to order by
  s_ordered <- stringi::stri_extract_all_regex(colnames(dat), "[[:alpha:]]+") %>% 
    unlist() %>% 
    unique() %>% 
    sort()
  
  if (length(s_ordered) > 1) {
    # replace words with their alphabetical index
    cnames <- stringi::stri_replace_all_fixed(colnames(dat), s_ordered, seq_along(s_ordered), vectorise_all = FALSE)
  } else {
    cnames <- colnames(dat)
  }
  
  cnames %>% 
    stringi::stri_extract_all_regex("\d+") %>% # extract all numbers (including the alphabetical index numbers)
    lapply(as.numeric) %>% 
    lapply(sum) %>% 
    unlist() %>% 
    order()
  
}

在函数的第一部分，我从列名中提取字符串并对其进行排序。然后使用它们的顺序用它们的索引替换 colnames 中的单词。之后，我提取数值并几乎遵循您最初的方法。我把它放在一个函数中以使其更易于使用：

colnames(new_dat)[order_cols(new_dat)]
#>  [1] "[0,25) east"          "[0,25) north"         "[0,25) south"        
#>  [4] "[0,25) west"          "[25,50) east"         "[25,50) north"       
#>  [7] "[25,50) south"        "[25,50) west"         "[25,100) east"       
#> [10] "[25,100) north"       "[25,100) south"       "[25,100) west"       
#> [13] "[50,100) east"        "[50,100) north"       "[50,100) south"      
#> [16] "[50,100) west"        "[100,250) east"       "[100,250) north"     
#> [19] "[100,250) south"      "[100,250) west"       "[100,500) east"      
#> [22] "[100,500) north"      "[100,500) south"      "[100,500) west"      
#> [25] "[250,500) east"       "[250,500) north"      "[250,500) south"     
#> [28] "[250,500) west"       "[500,1000) east"      "[500,1000) north"    
#> [31] "[500,1000) south"     "[500,1000) west"      "[1000,1500) east"    
#> [34] "[1000,1500) north"    "[1000,1500) south"    "[1000,1500) west"    
#> [37] "[1500,3000) east"     "[1500,3000) north"    "[1500,3000) south"   
#> [40] "[1500,3000) west"     "[500,1000000] east"   "[500,1000000] north" 
#> [43] "[500,1000000] south"  "[500,1000000] west"   "[1000,1000000] east" 
#> [46] "[1000,1000000] north" "[1000,1000000] south" "[1000,1000000] west" 
#> [49] "[3000,1000000] east"  "[3000,1000000] north" "[3000,1000000] south"
#> [52] "[3000,1000000] west"


colnames(dat)[order_cols(dat)]
#>  [1] "[0,25)"         "[25,50)"        "[25,100)"       "[50,100)"      
#>  [5] "[100,250)"      "[100,500)"      "[250,500)"      "[500,1000)"    
#>  [9] "[1000,1500)"    "[1500,3000)"    "[500,1000000]"  "[1000,1000000]"
#> [13] "[3000,1000000]"

^{由 reprex package (v2.0.1)}

创建于 2022-05-06

P.S.: 如果您使用的是较新版本的 R (>= 4.10)，则可以使用本地管道 (|>) 而不是 magrittr的%>%.

Answer 2

要构建您可以做的解决方案，

ord <- gsub("\D+", ",", stri_replace_all_regex(names(new_dat), '[A-Za-z]', 1:4)) %>% 
     strsplit(",") %>% 
     lapply(as.numeric) %>% 
     lapply(sum, na.rm = TRUE) %>% 
     unlist() %>% 
     order()

> names(new_dat)[ord]
 [1] "[0,25) east"          "[0,25) south"         "[0,25) north"         "[0,25) west"          "[25,50) east"         "[25,50) south"        "[25,50) north"        "[25,50) west"         "[25,100) east"        "[25,100) south"      
[11] "[25,100) north"       "[25,100) west"        "[50,100) east"        "[50,100) south"       "[50,100) north"       "[50,100) west"        "[100,250) east"       "[100,250) south"      "[100,250) north"      "[100,250) west"      
[21] "[100,500) east"       "[100,500) south"      "[100,500) north"      "[100,500) west"       "[250,500) east"       "[250,500) south"      "[250,500) north"      "[250,500) west"       "[500,1000) east"      "[500,1000) south"    
[31] "[500,1000) north"     "[500,1000) west"      "[1000,1500) east"     "[1000,1500) south"    "[1000,1500) north"    "[1000,1500) west"     "[1500,3000) east"     "[1500,3000) south"    "[1500,3000) north"    "[1500,3000) west"    
[41] "[500,1000000] east"   "[500,1000000] south"  "[500,1000000] north"  "[500,1000000] west"   "[1000,1000000] east"  "[1000,1000000] south" "[1000,1000000] north" "[1000,1000000] west"  "[3000,1000000] east"  "[3000,1000000] south"
[51] "[3000,1000000] north" "[3000,1000000] west"

转换列名，以便它们可以按数字顺序排列

Converting column names so they can be put in an numerical order

r

strsplit

dplyr

新数据

旧数据和原始解决方案

新方法