根据 r 中的最后一个单词对字符串进行排序
sort the strings based on last word in r
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 plyr_1.8.3 tidyr_0.3.1 gridExtra_2.0.0 scales_0.3.0
[6] ggplot2_1.0.1 RPostgreSQL_0.4 DBI_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 lubridate_1.3.3 assertthat_0.1 digest_0.6.8 MASS_7.3-44
[6] R6_2.1.1 grid_3.2.2 gtable_0.1.2 magrittr_1.5 stringi_0.5-5
[11] reshape2_1.4.1 proto_0.3-10 tools_3.2.2 stringr_1.0.0 munsell_0.4.2
[16] parallel_3.2.2 colorspace_1.2-6 memoise_0.2.1
例如,我在一列中有 n 个字符串,如下所示。我想根据最后一个词对字符串进行排序。
dput(dsp)
c("handlingstation / cropping/ forward / Linie 1", "handlingstation / cropping/ forward / Linie 2",
"conveyorstation / Linie 1", "conveyorstation / Linie 2", "soft / handling / cleaning / backward / Linie 3",
"jumper / doublejumper / Linie 1", "jumper / doublejumper / Linie 2"
)
dsp
[1] "handlingstation / cropping/ forward / Linie 1"
[2] "handlingstation / cropping/ forward / Linie 2"
[3] "conveyorstation / Linie 1"
[4] "conveyorstation / Linie 2"
[5] "soft / handling / cleaning / backward / Linie 3"
[6] "jumper / doublejumper / Linie 1"
[7] "jumper / doublejumper / Linie 2"
期望的输出
dsp_sorted
[1] "handlingstation / cropping/ forward / Linie 1"
[2] "conveyorstation / Linie 1"
[3] "jumper / doublejumper / Linie 1"
[4] "handlingstation / cropping/ forward / Linie 2"
[5] "conveyorstation / Linie 2"
[6] "jumper / doublejumper / Linie 2"
[7] "soft / handling / cleaning / backward / Linie 3"
我希望特定列中的所有字符串都根据最后一个词进行排序。这里应该以临聂1、临聂2等为准
谁能告诉我如何做这些。
您可以尝试以下操作
dsp[order(sub(".*/ ", "", dsp))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"
这基本上是使用正则表达式删除 /
最后一次出现之前的所有内容,并根据该词对向量进行排序
尽管在您的情况下,诉诸 混合顺序 操作可能更安全(因为您在单个值中有数字和字符)
library(gtools)
dsp[mixedorder(sub(".*/ ", "", dsp))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"
另一种选择(取决于您的真实数据)是从字符串末尾提取数字并进行相应排序
dsp[order(as.numeric(sub(".*(\d+$)", "\1", dsp)))]
显然 stringi
包也有一个混合顺序选项,通过在提取字符串的最后一个单词时指定 opts_collator = list(numeric = TRUE)
,所以你也可以做
library(stringi)
dsp[stri_order(stri_extract_last_words(dsp), opts_collator = list(numeric = TRUE))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 plyr_1.8.3 tidyr_0.3.1 gridExtra_2.0.0 scales_0.3.0
[6] ggplot2_1.0.1 RPostgreSQL_0.4 DBI_0.3.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 lubridate_1.3.3 assertthat_0.1 digest_0.6.8 MASS_7.3-44
[6] R6_2.1.1 grid_3.2.2 gtable_0.1.2 magrittr_1.5 stringi_0.5-5
[11] reshape2_1.4.1 proto_0.3-10 tools_3.2.2 stringr_1.0.0 munsell_0.4.2
[16] parallel_3.2.2 colorspace_1.2-6 memoise_0.2.1
例如,我在一列中有 n 个字符串,如下所示。我想根据最后一个词对字符串进行排序。
dput(dsp)
c("handlingstation / cropping/ forward / Linie 1", "handlingstation / cropping/ forward / Linie 2",
"conveyorstation / Linie 1", "conveyorstation / Linie 2", "soft / handling / cleaning / backward / Linie 3",
"jumper / doublejumper / Linie 1", "jumper / doublejumper / Linie 2"
)
dsp
[1] "handlingstation / cropping/ forward / Linie 1"
[2] "handlingstation / cropping/ forward / Linie 2"
[3] "conveyorstation / Linie 1"
[4] "conveyorstation / Linie 2"
[5] "soft / handling / cleaning / backward / Linie 3"
[6] "jumper / doublejumper / Linie 1"
[7] "jumper / doublejumper / Linie 2"
期望的输出
dsp_sorted
[1] "handlingstation / cropping/ forward / Linie 1"
[2] "conveyorstation / Linie 1"
[3] "jumper / doublejumper / Linie 1"
[4] "handlingstation / cropping/ forward / Linie 2"
[5] "conveyorstation / Linie 2"
[6] "jumper / doublejumper / Linie 2"
[7] "soft / handling / cleaning / backward / Linie 3"
我希望特定列中的所有字符串都根据最后一个词进行排序。这里应该以临聂1、临聂2等为准
谁能告诉我如何做这些。
您可以尝试以下操作
dsp[order(sub(".*/ ", "", dsp))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"
这基本上是使用正则表达式删除 /
最后一次出现之前的所有内容,并根据该词对向量进行排序
尽管在您的情况下,诉诸 混合顺序 操作可能更安全(因为您在单个值中有数字和字符)
library(gtools)
dsp[mixedorder(sub(".*/ ", "", dsp))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"
另一种选择(取决于您的真实数据)是从字符串末尾提取数字并进行相应排序
dsp[order(as.numeric(sub(".*(\d+$)", "\1", dsp)))]
显然 stringi
包也有一个混合顺序选项,通过在提取字符串的最后一个单词时指定 opts_collator = list(numeric = TRUE)
,所以你也可以做
library(stringi)
dsp[stri_order(stri_extract_last_words(dsp), opts_collator = list(numeric = TRUE))]
# [1] "handlingstation / cropping/ forward / Linie 1" "conveyorstation / Linie 1"
# [3] "jumper / doublejumper / Linie 1" "handlingstation / cropping/ forward / Linie 2"
# [5] "conveyorstation / Linie 2" "jumper / doublejumper / Linie 2"
# [7] "soft / handling / cleaning / backward / Linie 3"