通过 newsanchor 包 [在 R 中] 提取完整的文章文本

Question

我正在使用 R 中的 newsanchor 包尝试通过 NewsAPI 提取整篇文章内容。现在我做了以下事情：

require(newsanchor)
results <- get_everything(query = "Trump +Trade", language = "en")
test <- results$results_df

这给了我一个数据框，其中包含（最多）100 篇文章的信息。然而，这些并不包含完整的实际文章文本。相反，它们包含如下内容：

[1] "Tensions between China and the U.S. ratcheted up several notches over the weekend as Washington sent a warship into the disputed waters of the South China Sea. Meanwhile, Google dealt Huaweis smartphone business a crippling blow and an escalating trade war co… [+5173 chars]"

有没有办法提取剩余的 5173 个字符。我试图阅读文档，但我不太确定。

Answer 1

我认为至少在免费计划中这是不可能的。如果您在 Response object 部分浏览 https://newsapi.org/docs/endpoints/everything 处的文档，它会显示：

content - string

The unformatted content of the article, where available. This is truncated to 260 chars for Developer plan users.

因此所有 content 都被限制在 260 个字符以内。但是，test$url 具有源文章的 link，您可以使用它来抓取全部内容，但由于它是从各种来源聚合而来的，我认为没有一种自动化的方法可以做到这一点。

通过 newsanchor 包 [在 R 中] 提取完整的文章文本

Extracting full article text via the newsanchor package [in R]

r

feed