我可以使用 R 突出显示段落中的某些单词背景颜色吗？

Question

我有一些段落，每个段落都有不同的关键词。例如：

I am a student. I like machine learning...

这里我的关键词是学生和机器学习。我想给他们不同的颜色，比如红色代表学生，黄色代表机器学习。所以，结果应该是这样的：

我可以使用 R 来做这个吗？如何做？

此外，我知道 Python 可以以某种方式做到这一点。例如：

from spacy import displacy

doc = nlp('I just bought 2 shares at 9 a.m. because the stock went up 30% in just 2 days according to the WSJ')
displacy.render(doc, style='ent', jupyter=True)

在这里，结果是：

但这看起来只是名称实体。就我而言，我的关键字是我自己提取的。所以它可能会有所不同

Answer 1

如评论中所述，我前段时间为此创建了a small package。它仍处于试验阶段，目前只能在 RMarkdown 中使用，否则它将打开浏览器 window（Rstudio 中的查看器窗格）以在交互使用时显示文本。

# devtools::install_github("JBGruber/highlightr")
library(highlightr)
text <- "I am a student. I like machine learning..."
df <- data.frame(
  feature = c("student", "machine learning"),
  bg_colour = c("red", "yellow"),
  stringsAsFactors = FALSE
)
dict <- as_dict(df)
highlight(text, dict)

---
output: html_document
---

```{r , results='asis'}
library(highlightr)
text <- "I am a student. I like machine learning..."
df <- data.frame(
  feature = c("student", "machine learning"),
  bg_colour = c("red", "yellow"),
  stringsAsFactors = FALSE
)
dict <- as_dict(df)
highlight(text, dict)
```

这个包是建立在对 html 输出的一些非常直接的操作之上的：

# bg_colour
for (j in seq_along(dict$feature)) {
  text[i] <- stringi::stri_replace_all_fixed(
    str = text[i],
    pattern = dict$feature[j],
    replacement = paste0("<span style='background-color: ",
                         dict$bg_colour[j], "'>",
                         dict$feature[j], "</span>"),
    opts_fixed = stringi::stri_opts_fixed(case_insensitive = case_insensitive)
  )
}

我在这里所做的就是在突出显示的单词前添加 <span style='background-color: yellow'> 并在该单词后添加 </span>。当我有时间时，我会为 LaTeX 输出做同样的事情，也许更多。这里使用stringi做一个简单的替换工作的原因是它可以不区分大小写而忽略其他正则表达式。

我可以使用 R 突出显示段落中的某些单词背景颜色吗？

Can I use R to highlight some words background color in a paragraph?

text

r

keyword

background-color