使用R计算文本文件中单词的出现次数

Question

我尝试创建一个 returns 文本文件中单词出现次数的函数。为此，我创建了一个包含文本中所有单词的列表。（a, c, c , d, e, f 都在示例词中）

[[1]]

 [1] a  

 [2] f 

 [3] e       

 [4] a 

[[2]] 

 [1] f 

 [2] f

 [3] e

我创建了一个 table 来存储每个单词的出现次数值

table(unlist(list))

  a b c d e

  3 3 2 1 1

我现在的问题是如何提取参数中单词出现的值。该函数将具有此结构

GetOccurence <- function(word, table)
{
   return(occurence)
}

任何想法请帮助我，在此先感谢

Answer 1

要回答有关您的功能的问题，您可以采用以下方法。

资料准备

为了复现性，我使用了公开的数据，稍微清洗了一下

library(tm)
data(acq)

# Basic cleaning
acq <- tm_map(acq, removePunctuation)  
acq <- tm_map(acq, removeNumbers)     
acq <- tm_map(acq, tolower)     
acq <- tm_map(acq, removeWords, stopwords("english"))  
acq <- tm_map(acq, stripWhitespace)   
acq <- tm_map(acq, PlainTextDocument) 

# Split list into words
wrds <- strsplit(paste(unlist(acq), collapse = " "), ' ')[[1]]
# Table
tblWrds <- table(wrds)

函数

GetOccurence <- function(word, table) {
    occurence <- as.data.frame(table)
    occurence <- occurence[grep(word, occurence[,1]), ]
    return(occurence)
}

已修改（仅限完整单词）

这个函数将只匹配完整个词，下面的解决方案利用了this answer.

GetOccurence <- function(word, table) {
    occurence <- as.data.frame(table)
    word <- paste0("\b", word, "\b")
    occurence <- occurence[grep(word, occurence[,1]), ]
    return(occurence)
}

使用R计算文本文件中单词的出现次数

Counting occurence of a word in a text file using R

text

r

find-occurrences

资料准备

函数

已修改（仅限完整单词）

使用R计算文本文件中单词的出现次数

Counting occurence of a word in a text file using R

text

r

find-occurrences

资料准备

函数

已修改 （仅限完整单词）

已修改（仅限完整单词）