尝试使用 | 创建字符串值或操作员
trying to create a value of strings using the | or operator
我正在尝试抓取网站 link。到目前为止,我下载了文本并将其设置为数据框。我有以下内容;
keywords <- c(credit | model)
text_df <- as.data.frame.table(text_df)
text_df %>%
filter(str_detect(text, keywords))
其中 credit 和 model 是我要搜索网站的两个值,即 return 包含 credit 或 model 一词的行。
我收到以下错误
Error in filter_impl(.data, dots) : object 'credit' not found
代码仅 return 包含单词 "model" 的结果,而忽略单词 "credit"。
我怎样才能 return 使用单词 "credit" 或 "model" 中的所有结果。
我的计划是 keywords <- c(credit | model | more_key_words | something_else | many values)
提前致谢。
编辑:
text_df:
Var 1 text
1 Here is some credit information
2 Some text which does not expalin any keywords but messy <li> text9182edj </i>
3 This line may contain the keyword model
4 another line which contains nothing of use
所以我试图只提取第 1 行和第 3 行。
好的,我已经检查过了,我认为它对你不起作用,因为你必须使用 or | filter()
内的运算符不在 str_detect()
内
所以它会这样工作:
keywords <- c("virg", "tos")
library(dplyr)
library(stringr)
iris %>%
filter(str_detect(Species, keywords[1]) | str_detect(Species, keywords[2]))
作为 keywords[1]
等你必须从这个变量中指定每个 "keyword"
我认为问题在于您需要将字符串作为参数传递给 str_detect
。要检查 "credit" 或 "model",您可以将它们粘贴到由 |
分隔的单个字符串中。
library(tidyverse)
library(stringr)
text_df <- read_table("Var 1 text
1 Here is some credit information
2 Some text which does not expalin any keywords but messy <li> text9182edj </i>
3 This line may contain the keyword model
4 another line which contains nothing of use")
keywords <- c("credit", "model")
any_word <- paste(keywords, collapse = "|")
text_df %>% filter(str_detect(text, any_word))
#> # A tibble: 2 x 3
#> Var `1` text
#> <int> <chr> <chr>
#> 1 1 Here is some credit information
#> 2 3 This line may contain the keyword model
我建议您在处理文字时远离正则表达式。您可以使用为您的特定任务量身定制的软件包。例如,尝试以下
library(corpus)
text <- readLines("http://norvig.com/big.txt") # sherlock holmes
terms <- c("watson", "sherlock holmes", "elementary")
text_locate(text, terms)
## text before instance after
## 1 1 …Book of The Adventures of Sherlock Holmes
## 2 27 Title: The Adventures of Sherlock Holmes
## 3 40 … EBOOK, THE ADVENTURES OF SHERLOCK HOLMES ***
## 4 50 SHERLOCK HOLMES
## 5 77 To Sherlock Holmes she is always the woman. I…
## 6 85 …," he remarked. "I think, Watson , that you have put on seve…
## 7 89 …t a trifle more, I fancy, Watson . And in practice again, I …
## 8 145 …ere's money in this case, Watson , if there is nothing else.…
## 9 163 …friend and colleague, Dr. Watson , who is occasionally good …
## 10 315 … for you. And good-night, Watson ," he added, as the wheels …
## 11 352 …s quite too good to lose, Watson . I was just balancing whet…
## 12 422 …as I had pictured it from Sherlock Holmes ' succinct description, but…
## 13 504 "Good-night, Mister Sherlock Holmes ."
## 14 515 …t it!" he cried, grasping Sherlock Holmes by either shoulder and loo…
## 15 553 "Mr. Sherlock Holmes , I believe?" said she.
## 16 559 "What!" Sherlock Holmes staggered back, white with…
## 17 565 …tter was superscribed to " Sherlock Holmes , Esq. To be left till call…
## 18 567 "MY DEAR MR. SHERLOCK HOLMES ,--You really did it very w…
## 19 569 …est to the celebrated Mr. Sherlock Holmes . Then I, rather imprudentl…
## 20 571 …s; and I remain, dear Mr. Sherlock Holmes ,
## ⋮ (189 rows total)
请注意,无论大小写,这都与字词匹配。
对于您的特定用例,
ix <- text_detect(text, terms)
或
matches <- text_subset(text, terms)
我正在尝试抓取网站 link。到目前为止,我下载了文本并将其设置为数据框。我有以下内容;
keywords <- c(credit | model)
text_df <- as.data.frame.table(text_df)
text_df %>%
filter(str_detect(text, keywords))
其中 credit 和 model 是我要搜索网站的两个值,即 return 包含 credit 或 model 一词的行。
我收到以下错误
Error in filter_impl(.data, dots) : object 'credit' not found
代码仅 return 包含单词 "model" 的结果,而忽略单词 "credit"。
我怎样才能 return 使用单词 "credit" 或 "model" 中的所有结果。
我的计划是 keywords <- c(credit | model | more_key_words | something_else | many values)
提前致谢。
编辑:
text_df:
Var 1 text
1 Here is some credit information
2 Some text which does not expalin any keywords but messy <li> text9182edj </i>
3 This line may contain the keyword model
4 another line which contains nothing of use
所以我试图只提取第 1 行和第 3 行。
好的,我已经检查过了,我认为它对你不起作用,因为你必须使用 or | filter()
内的运算符不在 str_detect()
所以它会这样工作:
keywords <- c("virg", "tos")
library(dplyr)
library(stringr)
iris %>%
filter(str_detect(Species, keywords[1]) | str_detect(Species, keywords[2]))
作为 keywords[1]
等你必须从这个变量中指定每个 "keyword"
我认为问题在于您需要将字符串作为参数传递给 str_detect
。要检查 "credit" 或 "model",您可以将它们粘贴到由 |
分隔的单个字符串中。
library(tidyverse)
library(stringr)
text_df <- read_table("Var 1 text
1 Here is some credit information
2 Some text which does not expalin any keywords but messy <li> text9182edj </i>
3 This line may contain the keyword model
4 another line which contains nothing of use")
keywords <- c("credit", "model")
any_word <- paste(keywords, collapse = "|")
text_df %>% filter(str_detect(text, any_word))
#> # A tibble: 2 x 3
#> Var `1` text
#> <int> <chr> <chr>
#> 1 1 Here is some credit information
#> 2 3 This line may contain the keyword model
我建议您在处理文字时远离正则表达式。您可以使用为您的特定任务量身定制的软件包。例如,尝试以下
library(corpus)
text <- readLines("http://norvig.com/big.txt") # sherlock holmes
terms <- c("watson", "sherlock holmes", "elementary")
text_locate(text, terms)
## text before instance after
## 1 1 …Book of The Adventures of Sherlock Holmes
## 2 27 Title: The Adventures of Sherlock Holmes
## 3 40 … EBOOK, THE ADVENTURES OF SHERLOCK HOLMES ***
## 4 50 SHERLOCK HOLMES
## 5 77 To Sherlock Holmes she is always the woman. I…
## 6 85 …," he remarked. "I think, Watson , that you have put on seve…
## 7 89 …t a trifle more, I fancy, Watson . And in practice again, I …
## 8 145 …ere's money in this case, Watson , if there is nothing else.…
## 9 163 …friend and colleague, Dr. Watson , who is occasionally good …
## 10 315 … for you. And good-night, Watson ," he added, as the wheels …
## 11 352 …s quite too good to lose, Watson . I was just balancing whet…
## 12 422 …as I had pictured it from Sherlock Holmes ' succinct description, but…
## 13 504 "Good-night, Mister Sherlock Holmes ."
## 14 515 …t it!" he cried, grasping Sherlock Holmes by either shoulder and loo…
## 15 553 "Mr. Sherlock Holmes , I believe?" said she.
## 16 559 "What!" Sherlock Holmes staggered back, white with…
## 17 565 …tter was superscribed to " Sherlock Holmes , Esq. To be left till call…
## 18 567 "MY DEAR MR. SHERLOCK HOLMES ,--You really did it very w…
## 19 569 …est to the celebrated Mr. Sherlock Holmes . Then I, rather imprudentl…
## 20 571 …s; and I remain, dear Mr. Sherlock Holmes ,
## ⋮ (189 rows total)
请注意,无论大小写,这都与字词匹配。
对于您的特定用例,
ix <- text_detect(text, terms)
或
matches <- text_subset(text, terms)