R 空“”行无法删除

R empty "" rows cannot be removed

我正在从 Reddit 抓取评论并尝试删除空的 rows/comments。

许多行显示为空,但我似乎无法删除它们。当我使用 is_empty 时,它们不会显示为空。

> Reddit[25,]
[1] "​"

> is_empty(Reddit$text[25])
[1] FALSE

> Reddit <- subset(Reddit, text != "")
> Reddit[25,]
[1] "​"

我错过了什么吗?我已经尝试了其他几种方法来删除这些行,但它们都没有用。

编辑: 在对评论的回答中包含 dput 示例:

RedditSample <- data.frame(text=
c("I liked coinbase, used it before. But the fees are simply too much. If they were to take 1% instead 2.5% I would understand. It's much simpler and long term it doesn't matter as much.", 
"But Binance only charges 0.1% so making the switch is worth it fairly quickly. They also have many more coins. Approval process took me less than 10 minutes, but always depends on how many register at the same time.", 
"​", "Here's a 10%/10% referal code if you chose to register: KHELMJ94", 
"What is a spot wallet?"))

实际上您分享的数据不包含空字符串,它包含一个 Unicode 零宽度 space 字符。你可以看到

charToRaw(RedditSample$text[3])
# [1] e2 80 8b

您可以使用匹配“单词”字符的正则表达式来确保存在非space字符

subset(RedditSample, grepl("\w", text))

您可以使用字符串长度函数。例如在包含 stringr 包的 tidyverse 中:

library(tidyverse)

Reddit %>%
    filter(str_length(text) > 0)

或基数 R:

Reddit[ nchar(Reddit$text) >0, ]