在txt文件中查找名字
Finding names in txt file
我在 txt 文件中有一段很长的文本 (T1.txt)。
我想在 txt 文件中找到所有的名字(英文)和名字后面的 2 个前面的单词和后面的 2 个单词。
例如我有以下文本:
"Hello world!, my name is Mr. A.B. Morgan (in short) and it is nice to meet you."
Orange Silver paid 100$ for his gift.
I'll call Dina H. in two hours.
我想获取以下数据框:
> df1
Before Name After
1 name is A. B. Morgan in short
2 Orange Silver paid 100$
3 I'll call Dina H. in two
这并不完美,也不漂亮,但这是一个开始:
text1 <- c("Hello world!, my name is Mr. A.B. Morgan (in short) and it is nice to meet you.")
text2 <- c("Orange Silver paid 100$ for his gift.")
text3 <- c("I'll call Dina H. in two hours.")
library(stringr)
find_names_and_BA <- function(x) {
matches <- str_extract_all(str_sub(x, 2), "[A-Z]\S+")[[1]]
if (length(matches) < 2) { matches <- str_extract_all(x, "[A-Z]\S+")[[1]] }
name_match <- paste(matches, collapse = " ")
beg_of_match <- str_locate(x, name_match)[1]
end_of_match <- str_locate(x, name_match)[2]
start_words <- str_extract_all(str_sub(x, , beg_of_match), "\w+")[[1]]
end_words <- str_extract_all(str_sub(x, end_of_match), "\w+")[[1]]
before <- paste(tail(start_words, 3)[1:2], collapse = " ")
after <- paste(head(end_words, 3)[2:3], collapse = " ")
return( data.frame(Before = before, Name = name_match, After = after) )
}
dplyr::bind_rows(find_names_and_BA(text1),
find_names_and_BA(text2),
find_names_and_BA(text3))
# Source: local data frame [3 x 3]
#
# Before Name After
# (chr) (chr) (chr)
# 1 name is Mr. A.B. Morgan in short
# 2 O NA Orange Silver paid 100
# 3 ll call Dina H. two hours
我在 txt 文件中有一段很长的文本 (T1.txt)。 我想在 txt 文件中找到所有的名字(英文)和名字后面的 2 个前面的单词和后面的 2 个单词。 例如我有以下文本:
"Hello world!, my name is Mr. A.B. Morgan (in short) and it is nice to meet you."
Orange Silver paid 100$ for his gift.
I'll call Dina H. in two hours.
我想获取以下数据框:
> df1
Before Name After
1 name is A. B. Morgan in short
2 Orange Silver paid 100$
3 I'll call Dina H. in two
这并不完美,也不漂亮,但这是一个开始:
text1 <- c("Hello world!, my name is Mr. A.B. Morgan (in short) and it is nice to meet you.")
text2 <- c("Orange Silver paid 100$ for his gift.")
text3 <- c("I'll call Dina H. in two hours.")
library(stringr)
find_names_and_BA <- function(x) {
matches <- str_extract_all(str_sub(x, 2), "[A-Z]\S+")[[1]]
if (length(matches) < 2) { matches <- str_extract_all(x, "[A-Z]\S+")[[1]] }
name_match <- paste(matches, collapse = " ")
beg_of_match <- str_locate(x, name_match)[1]
end_of_match <- str_locate(x, name_match)[2]
start_words <- str_extract_all(str_sub(x, , beg_of_match), "\w+")[[1]]
end_words <- str_extract_all(str_sub(x, end_of_match), "\w+")[[1]]
before <- paste(tail(start_words, 3)[1:2], collapse = " ")
after <- paste(head(end_words, 3)[2:3], collapse = " ")
return( data.frame(Before = before, Name = name_match, After = after) )
}
dplyr::bind_rows(find_names_and_BA(text1),
find_names_and_BA(text2),
find_names_and_BA(text3))
# Source: local data frame [3 x 3]
#
# Before Name After
# (chr) (chr) (chr)
# 1 name is Mr. A.B. Morgan in short
# 2 O NA Orange Silver paid 100
# 3 ll call Dina H. two hours