从语料库中删除特殊字符
Remove special character from corpus
我建立了一个数据来显示所有带有标点符号的术语及其出现频率。然后我应该从它们中删除标点符号并检查是否还有任何标点符号。
newpapers1 <- tm_map(newpapers, removePunctuation)
punremove <- function(x){gsub(c('¡'|'¯'),"",x)}
punremove1 <- lapply(newpapers1, punremove)
my.check.func <- function(x){str_extract_all(x, "[[:punct:]]")}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p
但我还是得到了这个特殊字符:
Var1 Freq
1 ¡ 25
有没有办法编写一个函数来一起删除所有标点符号或一个函数来删除这个?
编辑:
检查文件后,标点符号仍然存在:
> newpapers1[[24]]$content
"This study employs a crosscultural perspective to examine how local
audiences perceive and enjoy foreign dramas and how this psychological
process differs depending on the cultural distance between the media
and the viewing audience Using a convenience sample of young Korean
college students this study as predicted by cultural discount theory
shows that cultural distance decreases Korean audiences¡¯ perceived
identification with dramatic characters which erodes their enjoyment
of foreign dramas Unlike cultural discount theory however cultural
distance arouses Korean audiences¡¯ perception of novelty which
heightens their enjoyment of foreign dramas This study discusses the
theoretical and practical implications of these findings as well as
their potential limitations"
您可以使用 gsub
删除标点符号,就像这样。
newpapers1 <- tm_map(newpapers, removePunctuation)
my.check.func <- function(x){gsub('[[:punct:]]+','',x)}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p
希望对您有所帮助。
我建立了一个数据来显示所有带有标点符号的术语及其出现频率。然后我应该从它们中删除标点符号并检查是否还有任何标点符号。
newpapers1 <- tm_map(newpapers, removePunctuation)
punremove <- function(x){gsub(c('¡'|'¯'),"",x)}
punremove1 <- lapply(newpapers1, punremove)
my.check.func <- function(x){str_extract_all(x, "[[:punct:]]")}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p
但我还是得到了这个特殊字符:
Var1 Freq
1 ¡ 25
有没有办法编写一个函数来一起删除所有标点符号或一个函数来删除这个?
编辑: 检查文件后,标点符号仍然存在:
> newpapers1[[24]]$content
"This study employs a crosscultural perspective to examine how local audiences perceive and enjoy foreign dramas and how this psychological process differs depending on the cultural distance between the media and the viewing audience Using a convenience sample of young Korean college students this study as predicted by cultural discount theory shows that cultural distance decreases Korean audiences¡¯ perceived identification with dramatic characters which erodes their enjoyment of foreign dramas Unlike cultural discount theory however cultural distance arouses Korean audiences¡¯ perception of novelty which heightens their enjoyment of foreign dramas This study discusses the theoretical and practical implications of these findings as well as their potential limitations"
您可以使用 gsub
删除标点符号,就像这样。
newpapers1 <- tm_map(newpapers, removePunctuation)
my.check.func <- function(x){gsub('[[:punct:]]+','',x)}
my.check1 <- lapply(newpapers1, my.check.func)
p <- as.data.frame(table(unlist(my.check1)))
p
希望对您有所帮助。