R中的字符串重排
String rearrangment in R
我有一长串城市名称及其省份名称。这是我的部分数据列表
data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
我想把状态排在第一位,像这样State_Jharkhand_Bokaro
。如果城市是首都,则 State_Jharkhand_Capital_Ranchi
。另请注意,城市名称或州名称可能只有一个字符串或多个字符串(例如 Tata Nagar)。
最有效的方法是什么(不使用任何循环)?
这并没有真正使用太多正则表达式,而是主要基于信息的预期位置。用“_”拆分字符串,然后根据需要重新排序:
data
# [1] "Ranchi_Capital_State_Jharkhand" "Bokaro_State_Jharkhand"
# [3] "Tata Nagar_State_Jharkhand" "Ramgarh_State_Jharkhand"
# [5] "Pune_State_Maharashtra" "Mumbai_Capital_State_Maharashtra"
# [7] "Nagpur_State_Maharashtra"
A <- strsplit(data, "_", TRUE)
sapply(A, function(x) {
if (length(x) == 3) {
paste(x[c(2, 3, 1)], collapse = "_")
} else if (length(x) == 4) {
paste(x[c(3, 4, 2, 1)], collapse = "_")
} else {
stop("unexpected length")
}
})
# [1] "State_Jharkhand_Capital_Ranchi" "State_Jharkhand_Bokaro"
# [3] "State_Jharkhand_Tata Nagar" "State_Jharkhand_Ramgarh"
# [5] "State_Maharashtra_Pune" "State_Maharashtra_Capital_Mumbai"
# [7] "State_Maharashtra_Nagpur"
我不知道使用 sapply
是否违反了您对 "without using any loop" 的要求。
您可以使用下面的 gsub
函数。
> data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
+ 'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
> gsub("^(?:(.*?)(_Capital))?(.*?)_(State.*)", "\4\2_\1\3", data)
[1] "State_Jharkhand_Capital_Ranchi" "State_Jharkhand_Bokaro"
[3] "State_Jharkhand_Tata Nagar" "State_Jharkhand_Ramgarh"
[5] "State_Maharashtra_Pune" "State_Maharashtra_Capital_Mumbai"
[7] "State_Maharashtra_Nagpur"
我有一长串城市名称及其省份名称。这是我的部分数据列表
data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
我想把状态排在第一位,像这样State_Jharkhand_Bokaro
。如果城市是首都,则 State_Jharkhand_Capital_Ranchi
。另请注意,城市名称或州名称可能只有一个字符串或多个字符串(例如 Tata Nagar)。
最有效的方法是什么(不使用任何循环)?
这并没有真正使用太多正则表达式,而是主要基于信息的预期位置。用“_”拆分字符串,然后根据需要重新排序:
data
# [1] "Ranchi_Capital_State_Jharkhand" "Bokaro_State_Jharkhand"
# [3] "Tata Nagar_State_Jharkhand" "Ramgarh_State_Jharkhand"
# [5] "Pune_State_Maharashtra" "Mumbai_Capital_State_Maharashtra"
# [7] "Nagpur_State_Maharashtra"
A <- strsplit(data, "_", TRUE)
sapply(A, function(x) {
if (length(x) == 3) {
paste(x[c(2, 3, 1)], collapse = "_")
} else if (length(x) == 4) {
paste(x[c(3, 4, 2, 1)], collapse = "_")
} else {
stop("unexpected length")
}
})
# [1] "State_Jharkhand_Capital_Ranchi" "State_Jharkhand_Bokaro"
# [3] "State_Jharkhand_Tata Nagar" "State_Jharkhand_Ramgarh"
# [5] "State_Maharashtra_Pune" "State_Maharashtra_Capital_Mumbai"
# [7] "State_Maharashtra_Nagpur"
我不知道使用 sapply
是否违反了您对 "without using any loop" 的要求。
您可以使用下面的 gsub
函数。
> data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
+ 'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
> gsub("^(?:(.*?)(_Capital))?(.*?)_(State.*)", "\4\2_\1\3", data)
[1] "State_Jharkhand_Capital_Ranchi" "State_Jharkhand_Bokaro"
[3] "State_Jharkhand_Tata Nagar" "State_Jharkhand_Ramgarh"
[5] "State_Maharashtra_Pune" "State_Maharashtra_Capital_Mumbai"
[7] "State_Maharashtra_Nagpur"