如何从 R 中的字符串中的多个列表中检测子字符串
How to detect substrings from multiple lists within a string in R
我正在尝试查找名为 "values" 的字符串是否包含来自两个不同列表的子字符串。这是我当前的代码:
for (i in 1:length(value)){
for (j in 1:length(city)){
if (str_detect(value[i],(city[j]))) == TRUE){
for (k in 1:length(school)){
if (str_detect(value[i],(school[j]))) == TRUE){
...........................................................
}
}
}
}
}
city
和 school
是不同长度的独立向量,每个向量都包含字符串元素。
city <- ("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
school <- ("Law", "Mathematics", "PoliSci", "Economics")
value <- ("Rey Juan Carlos Law Dept, Madrid", "New York University, Center of PoliSci Studies", ..........)
我想做的是查看 value
是否包含两个列表中元素的某种组合,以便稍后使用。这可以一步完成吗:像这样:
for (i in 1:length(value)){
if (str_detect(value[i],(city[j]))) == TRUE && str_detect(value[i],(school[j]))) == TRUE){
.............................................
}
}
试试这个:
library("stringr")
city <- c("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
school <- c("Law", "Mathematics", "PoliSci", "Economics")
value <- c(
"Rey Juan Carlos Law Dept, Madrid",
"New York University, Center of PoliSci Studies",
"Los Angeles, CALTECH",
"London, Physics",
"London, Mathematics"
)
for (v in value)
{
if (sum(str_detect(v, city)) > 0 & sum(str_detect(v, school)) > 0)
{
print (v)
}
}
执行时将打印与城市和学校有共同元素的那些:
[1] "Rey Juan Carlos Law Dept, Madrid"
[1] "New York University, Center of PoliSci Studies"
[1] "London, Mathematics"
这个问题与我一直在处理的问题类似。出于我的目的,需要返回一个保留原始输入结构的数据框。
这对你来说可能也是如此。因此,我修改了@rbm 的优秀解决方案如下:
library("stringr")
cityList <- c("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
schoolList <- c("Law", "Mathematics", "PoliSci", "Economics")
valueList <- c(
"Rey Juan Carlos Law Dept, Madrid",
"New York University, Center of PoliSci Studies",
"Los Angeles, CALTECH",
"London, Physics",
"London, Mathematics"
)
df <- data.frame(value, city=NA, school=NA, stringsAsFactors = FALSE)
i = 0
for (v in value)
{
i = i + 1
if (sum(str_detect(v, cityList)) > 0 & sum(str_detect(v, schoolList)) > 0)
{
df$city[i] <- schoolList[[which(str_detect(v, schoolList))]]
df$school[i] <- cityList[[which(str_detect(v, cityList))]]
} else {
df$city[i] <- ""
df$school[i] <- ""
}
}
print(df)
结果如下:
value city school
1 Rey Juan Carlos Law Dept, Madrid Law Madrid
2 New York University, Center of PoliSci Studies PoliSci New York
3 Los Angeles, CALTECH
4 London, Physics
5 London, Mathematics Mathematics London
我正在尝试查找名为 "values" 的字符串是否包含来自两个不同列表的子字符串。这是我当前的代码:
for (i in 1:length(value)){
for (j in 1:length(city)){
if (str_detect(value[i],(city[j]))) == TRUE){
for (k in 1:length(school)){
if (str_detect(value[i],(school[j]))) == TRUE){
...........................................................
}
}
}
}
}
city
和 school
是不同长度的独立向量,每个向量都包含字符串元素。
city <- ("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
school <- ("Law", "Mathematics", "PoliSci", "Economics")
value <- ("Rey Juan Carlos Law Dept, Madrid", "New York University, Center of PoliSci Studies", ..........)
我想做的是查看 value
是否包含两个列表中元素的某种组合,以便稍后使用。这可以一步完成吗:像这样:
for (i in 1:length(value)){
if (str_detect(value[i],(city[j]))) == TRUE && str_detect(value[i],(school[j]))) == TRUE){
.............................................
}
}
试试这个:
library("stringr")
city <- c("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
school <- c("Law", "Mathematics", "PoliSci", "Economics")
value <- c(
"Rey Juan Carlos Law Dept, Madrid",
"New York University, Center of PoliSci Studies",
"Los Angeles, CALTECH",
"London, Physics",
"London, Mathematics"
)
for (v in value)
{
if (sum(str_detect(v, city)) > 0 & sum(str_detect(v, school)) > 0)
{
print (v)
}
}
执行时将打印与城市和学校有共同元素的那些:
[1] "Rey Juan Carlos Law Dept, Madrid"
[1] "New York University, Center of PoliSci Studies"
[1] "London, Mathematics"
这个问题与我一直在处理的问题类似。出于我的目的,需要返回一个保留原始输入结构的数据框。
这对你来说可能也是如此。因此,我修改了@rbm 的优秀解决方案如下:
library("stringr")
cityList <- c("Madrid", "London", "Paris", "Sofia", "Cairo", "Detroit", "New York")
schoolList <- c("Law", "Mathematics", "PoliSci", "Economics")
valueList <- c(
"Rey Juan Carlos Law Dept, Madrid",
"New York University, Center of PoliSci Studies",
"Los Angeles, CALTECH",
"London, Physics",
"London, Mathematics"
)
df <- data.frame(value, city=NA, school=NA, stringsAsFactors = FALSE)
i = 0
for (v in value)
{
i = i + 1
if (sum(str_detect(v, cityList)) > 0 & sum(str_detect(v, schoolList)) > 0)
{
df$city[i] <- schoolList[[which(str_detect(v, schoolList))]]
df$school[i] <- cityList[[which(str_detect(v, cityList))]]
} else {
df$city[i] <- ""
df$school[i] <- ""
}
}
print(df)
结果如下:
value city school
1 Rey Juan Carlos Law Dept, Madrid Law Madrid
2 New York University, Center of PoliSci Studies PoliSci New York
3 Los Angeles, CALTECH
4 London, Physics
5 London, Mathematics Mathematics London