转换列名,以便它们可以按数字顺序排列
Converting column names so they can be put in an numerical order
我正在尝试通过创建适用于 new_dat
和 old_dat
的解决方案来扩展 。
新数据
new_dat <- structure(list(`[0,25) east` = c(1269L, 85L), `[0,25) north` = c(364L,
21L), `[0,25) south` = c(1172L, 97L), `[0,25) west` = c(549L,
49L), `[100,250) east` = c(441L, 149L), `[100,250) north` = c(224L,
45L), `[100,250) south` = c(521L, 247L), `[100,250) west` = c(770L,
124L), `[100,500) east` = c(0L, 0L), `[100,500) north` = c(0L,
0L), `[100,500) south` = c(0L, 0L), `[100,500) west` = c(0L,
0L), `[1000,1000000] east` = c(53L, 0L), `[1000,1000000] north` = c(82L,
0L), `[1000,1000000] south` = c(23L, 0L), `[1000,1000000] west` = c(63L,
0L), `[1000,1500) east` = c(0L, 0L), `[1000,1500) north` = c(0L,
0L), `[1000,1500) south` = c(0L, 0L), `[1000,1500) west` = c(0L,
0L), `[1500,3000) east` = c(0L, 0L), `[1500,3000) north` = c(0L,
0L), `[1500,3000) south` = c(0L, 0L), `[1500,3000) west` = c(0L,
0L), `[25,100) east` = c(579L, 220L), `[25,100) north` = c(406L,
58L), `[25,100) south` = c(1048L, 316L), `[25,100) west` = c(764L,
131L), `[25,50) east` = c(0L, 0L), `[25,50) north` = c(0L, 0L
), `[25,50) south` = c(0L, 0L), `[25,50) west` = c(0L, 0L), `[250,500) east` = c(232L,
172L), `[250,500) north` = c(207L, 40L), `[250,500) south` = c(202L,
148L), `[250,500) west` = c(457L, 153L), `[3000,1000000] east` = c(0L,
0L), `[3000,1000000] north` = c(0L, 0L), `[3000,1000000] south` = c(0L,
0L), `[3000,1000000] west` = c(0L, 0L), `[50,100) east` = c(0L,
0L), `[50,100) north` = c(0L, 0L), `[50,100) south` = c(0L, 0L
), `[50,100) west` = c(0L, 0L), `[500,1000) east` = c(103L, 0L
), `[500,1000) north` = c(185L, 0L), `[500,1000) south` = c(66L,
0L), `[500,1000) west` = c(200L, 0L), `[500,1000000] east` = c(0L,
288L), `[500,1000000] north` = c(0L, 120L), `[500,1000000] south` = c(0L,
229L), `[500,1000000] west` = c(0L, 175L)), row.names = c("Andere akkerbouwbedrijven",
"Andere combinatiebedrijven"), class = "data.frame")
旧数据和原始解决方案
old_dat <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L,
`[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L,
`[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L,
`[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")
该解决方案利用了这样一个事实,即添加的每个列名称中的两个数字之和提供了正确的顺序。
ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>%
strsplit(",") %>%
lapply(as.numeric) %>%
lapply(sum) %>%
unlist %>%
order()
colnames(dat)[ord]
新方法
新数据不仅要有数值,还要有字符串值(east, north, south, west
)。我意识到,如果我给 east
一个值 1
、north
或 2
等等,我可以使用相同的解决方案。三个数字的总和仍然提供正确的顺序。
虽然我在调整代码时遇到了一些问题。
ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>%
# provides "0,25 east", "0,25 north" etc
strsplit(",") %>%
# provides "0" and "25 east", "0" and "25 north" etc
lapply(as.numeric) %>%
lapply(sum) %>%
# SHOULD provide 0+25+1 (east), 0+25+2 (north) etc
unlist %>%
order()
问题在于将字符串拆分为 3 个部分,并将方向转换为数字、IF 和 ONLY IF,共有三个部分。否则它应该只使用两者。我应该怎么做?
也许有点矫枉过正,但有了这个,您不需要找到“东”、“南”等模式。
library(magrittr)
order_cols <- function(dat) {
# look for words to order by
s_ordered <- stringi::stri_extract_all_regex(colnames(dat), "[[:alpha:]]+") %>%
unlist() %>%
unique() %>%
sort()
if (length(s_ordered) > 1) {
# replace words with their alphabetical index
cnames <- stringi::stri_replace_all_fixed(colnames(dat), s_ordered, seq_along(s_ordered), vectorise_all = FALSE)
} else {
cnames <- colnames(dat)
}
cnames %>%
stringi::stri_extract_all_regex("\d+") %>% # extract all numbers (including the alphabetical index numbers)
lapply(as.numeric) %>%
lapply(sum) %>%
unlist() %>%
order()
}
在函数的第一部分,我从列名中提取字符串并对其进行排序。然后使用它们的顺序用它们的索引替换 colnames 中的单词。之后,我提取数值并几乎遵循您最初的方法。我把它放在一个函数中以使其更易于使用:
colnames(new_dat)[order_cols(new_dat)]
#> [1] "[0,25) east" "[0,25) north" "[0,25) south"
#> [4] "[0,25) west" "[25,50) east" "[25,50) north"
#> [7] "[25,50) south" "[25,50) west" "[25,100) east"
#> [10] "[25,100) north" "[25,100) south" "[25,100) west"
#> [13] "[50,100) east" "[50,100) north" "[50,100) south"
#> [16] "[50,100) west" "[100,250) east" "[100,250) north"
#> [19] "[100,250) south" "[100,250) west" "[100,500) east"
#> [22] "[100,500) north" "[100,500) south" "[100,500) west"
#> [25] "[250,500) east" "[250,500) north" "[250,500) south"
#> [28] "[250,500) west" "[500,1000) east" "[500,1000) north"
#> [31] "[500,1000) south" "[500,1000) west" "[1000,1500) east"
#> [34] "[1000,1500) north" "[1000,1500) south" "[1000,1500) west"
#> [37] "[1500,3000) east" "[1500,3000) north" "[1500,3000) south"
#> [40] "[1500,3000) west" "[500,1000000] east" "[500,1000000] north"
#> [43] "[500,1000000] south" "[500,1000000] west" "[1000,1000000] east"
#> [46] "[1000,1000000] north" "[1000,1000000] south" "[1000,1000000] west"
#> [49] "[3000,1000000] east" "[3000,1000000] north" "[3000,1000000] south"
#> [52] "[3000,1000000] west"
colnames(dat)[order_cols(dat)]
#> [1] "[0,25)" "[25,50)" "[25,100)" "[50,100)"
#> [5] "[100,250)" "[100,500)" "[250,500)" "[500,1000)"
#> [9] "[1000,1500)" "[1500,3000)" "[500,1000000]" "[1000,1000000]"
#> [13] "[3000,1000000]"
由 reprex package (v2.0.1)
创建于 2022-05-06
P.S.: 如果您使用的是较新版本的 R
(>= 4.10),则可以使用本地管道 (|>
) 而不是 magrittr
的%>%
.
要构建您可以做的解决方案,
ord <- gsub("\D+", ",", stri_replace_all_regex(names(new_dat), '[A-Za-z]', 1:4)) %>%
strsplit(",") %>%
lapply(as.numeric) %>%
lapply(sum, na.rm = TRUE) %>%
unlist() %>%
order()
> names(new_dat)[ord]
[1] "[0,25) east" "[0,25) south" "[0,25) north" "[0,25) west" "[25,50) east" "[25,50) south" "[25,50) north" "[25,50) west" "[25,100) east" "[25,100) south"
[11] "[25,100) north" "[25,100) west" "[50,100) east" "[50,100) south" "[50,100) north" "[50,100) west" "[100,250) east" "[100,250) south" "[100,250) north" "[100,250) west"
[21] "[100,500) east" "[100,500) south" "[100,500) north" "[100,500) west" "[250,500) east" "[250,500) south" "[250,500) north" "[250,500) west" "[500,1000) east" "[500,1000) south"
[31] "[500,1000) north" "[500,1000) west" "[1000,1500) east" "[1000,1500) south" "[1000,1500) north" "[1000,1500) west" "[1500,3000) east" "[1500,3000) south" "[1500,3000) north" "[1500,3000) west"
[41] "[500,1000000] east" "[500,1000000] south" "[500,1000000] north" "[500,1000000] west" "[1000,1000000] east" "[1000,1000000] south" "[1000,1000000] north" "[1000,1000000] west" "[3000,1000000] east" "[3000,1000000] south"
[51] "[3000,1000000] north" "[3000,1000000] west"
我正在尝试通过创建适用于 new_dat
和 old_dat
的解决方案来扩展
新数据
new_dat <- structure(list(`[0,25) east` = c(1269L, 85L), `[0,25) north` = c(364L,
21L), `[0,25) south` = c(1172L, 97L), `[0,25) west` = c(549L,
49L), `[100,250) east` = c(441L, 149L), `[100,250) north` = c(224L,
45L), `[100,250) south` = c(521L, 247L), `[100,250) west` = c(770L,
124L), `[100,500) east` = c(0L, 0L), `[100,500) north` = c(0L,
0L), `[100,500) south` = c(0L, 0L), `[100,500) west` = c(0L,
0L), `[1000,1000000] east` = c(53L, 0L), `[1000,1000000] north` = c(82L,
0L), `[1000,1000000] south` = c(23L, 0L), `[1000,1000000] west` = c(63L,
0L), `[1000,1500) east` = c(0L, 0L), `[1000,1500) north` = c(0L,
0L), `[1000,1500) south` = c(0L, 0L), `[1000,1500) west` = c(0L,
0L), `[1500,3000) east` = c(0L, 0L), `[1500,3000) north` = c(0L,
0L), `[1500,3000) south` = c(0L, 0L), `[1500,3000) west` = c(0L,
0L), `[25,100) east` = c(579L, 220L), `[25,100) north` = c(406L,
58L), `[25,100) south` = c(1048L, 316L), `[25,100) west` = c(764L,
131L), `[25,50) east` = c(0L, 0L), `[25,50) north` = c(0L, 0L
), `[25,50) south` = c(0L, 0L), `[25,50) west` = c(0L, 0L), `[250,500) east` = c(232L,
172L), `[250,500) north` = c(207L, 40L), `[250,500) south` = c(202L,
148L), `[250,500) west` = c(457L, 153L), `[3000,1000000] east` = c(0L,
0L), `[3000,1000000] north` = c(0L, 0L), `[3000,1000000] south` = c(0L,
0L), `[3000,1000000] west` = c(0L, 0L), `[50,100) east` = c(0L,
0L), `[50,100) north` = c(0L, 0L), `[50,100) south` = c(0L, 0L
), `[50,100) west` = c(0L, 0L), `[500,1000) east` = c(103L, 0L
), `[500,1000) north` = c(185L, 0L), `[500,1000) south` = c(66L,
0L), `[500,1000) west` = c(200L, 0L), `[500,1000000] east` = c(0L,
288L), `[500,1000000] north` = c(0L, 120L), `[500,1000000] south` = c(0L,
229L), `[500,1000000] west` = c(0L, 175L)), row.names = c("Andere akkerbouwbedrijven",
"Andere combinatiebedrijven"), class = "data.frame")
旧数据和原始解决方案
old_dat <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L,
`[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L,
`[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L,
`[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")
该解决方案利用了这样一个事实,即添加的每个列名称中的两个数字之和提供了正确的顺序。
ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>%
strsplit(",") %>%
lapply(as.numeric) %>%
lapply(sum) %>%
unlist %>%
order()
colnames(dat)[ord]
新方法
新数据不仅要有数值,还要有字符串值(east, north, south, west
)。我意识到,如果我给 east
一个值 1
、north
或 2
等等,我可以使用相同的解决方案。三个数字的总和仍然提供正确的顺序。
虽然我在调整代码时遇到了一些问题。
ord <- gsub("\[|\]|\)", "", colnames(new_dat)) %>%
# provides "0,25 east", "0,25 north" etc
strsplit(",") %>%
# provides "0" and "25 east", "0" and "25 north" etc
lapply(as.numeric) %>%
lapply(sum) %>%
# SHOULD provide 0+25+1 (east), 0+25+2 (north) etc
unlist %>%
order()
问题在于将字符串拆分为 3 个部分,并将方向转换为数字、IF 和 ONLY IF,共有三个部分。否则它应该只使用两者。我应该怎么做?
也许有点矫枉过正,但有了这个,您不需要找到“东”、“南”等模式。
library(magrittr)
order_cols <- function(dat) {
# look for words to order by
s_ordered <- stringi::stri_extract_all_regex(colnames(dat), "[[:alpha:]]+") %>%
unlist() %>%
unique() %>%
sort()
if (length(s_ordered) > 1) {
# replace words with their alphabetical index
cnames <- stringi::stri_replace_all_fixed(colnames(dat), s_ordered, seq_along(s_ordered), vectorise_all = FALSE)
} else {
cnames <- colnames(dat)
}
cnames %>%
stringi::stri_extract_all_regex("\d+") %>% # extract all numbers (including the alphabetical index numbers)
lapply(as.numeric) %>%
lapply(sum) %>%
unlist() %>%
order()
}
在函数的第一部分,我从列名中提取字符串并对其进行排序。然后使用它们的顺序用它们的索引替换 colnames 中的单词。之后,我提取数值并几乎遵循您最初的方法。我把它放在一个函数中以使其更易于使用:
colnames(new_dat)[order_cols(new_dat)]
#> [1] "[0,25) east" "[0,25) north" "[0,25) south"
#> [4] "[0,25) west" "[25,50) east" "[25,50) north"
#> [7] "[25,50) south" "[25,50) west" "[25,100) east"
#> [10] "[25,100) north" "[25,100) south" "[25,100) west"
#> [13] "[50,100) east" "[50,100) north" "[50,100) south"
#> [16] "[50,100) west" "[100,250) east" "[100,250) north"
#> [19] "[100,250) south" "[100,250) west" "[100,500) east"
#> [22] "[100,500) north" "[100,500) south" "[100,500) west"
#> [25] "[250,500) east" "[250,500) north" "[250,500) south"
#> [28] "[250,500) west" "[500,1000) east" "[500,1000) north"
#> [31] "[500,1000) south" "[500,1000) west" "[1000,1500) east"
#> [34] "[1000,1500) north" "[1000,1500) south" "[1000,1500) west"
#> [37] "[1500,3000) east" "[1500,3000) north" "[1500,3000) south"
#> [40] "[1500,3000) west" "[500,1000000] east" "[500,1000000] north"
#> [43] "[500,1000000] south" "[500,1000000] west" "[1000,1000000] east"
#> [46] "[1000,1000000] north" "[1000,1000000] south" "[1000,1000000] west"
#> [49] "[3000,1000000] east" "[3000,1000000] north" "[3000,1000000] south"
#> [52] "[3000,1000000] west"
colnames(dat)[order_cols(dat)]
#> [1] "[0,25)" "[25,50)" "[25,100)" "[50,100)"
#> [5] "[100,250)" "[100,500)" "[250,500)" "[500,1000)"
#> [9] "[1000,1500)" "[1500,3000)" "[500,1000000]" "[1000,1000000]"
#> [13] "[3000,1000000]"
由 reprex package (v2.0.1)
创建于 2022-05-06P.S.: 如果您使用的是较新版本的 R
(>= 4.10),则可以使用本地管道 (|>
) 而不是 magrittr
的%>%
.
要构建您可以做的解决方案,
ord <- gsub("\D+", ",", stri_replace_all_regex(names(new_dat), '[A-Za-z]', 1:4)) %>%
strsplit(",") %>%
lapply(as.numeric) %>%
lapply(sum, na.rm = TRUE) %>%
unlist() %>%
order()
> names(new_dat)[ord]
[1] "[0,25) east" "[0,25) south" "[0,25) north" "[0,25) west" "[25,50) east" "[25,50) south" "[25,50) north" "[25,50) west" "[25,100) east" "[25,100) south"
[11] "[25,100) north" "[25,100) west" "[50,100) east" "[50,100) south" "[50,100) north" "[50,100) west" "[100,250) east" "[100,250) south" "[100,250) north" "[100,250) west"
[21] "[100,500) east" "[100,500) south" "[100,500) north" "[100,500) west" "[250,500) east" "[250,500) south" "[250,500) north" "[250,500) west" "[500,1000) east" "[500,1000) south"
[31] "[500,1000) north" "[500,1000) west" "[1000,1500) east" "[1000,1500) south" "[1000,1500) north" "[1000,1500) west" "[1500,3000) east" "[1500,3000) south" "[1500,3000) north" "[1500,3000) west"
[41] "[500,1000000] east" "[500,1000000] south" "[500,1000000] north" "[500,1000000] west" "[1000,1000000] east" "[1000,1000000] south" "[1000,1000000] north" "[1000,1000000] west" "[3000,1000000] east" "[3000,1000000] south"
[51] "[3000,1000000] north" "[3000,1000000] west"