twitteR 返回的推文被缩短了

Tweets returned by twitteR are shortened

我正在使用 RtwitteR 包来收集一些推文。但是,我注意到 searchTwitter 函数返回的推文文本不是完整的推文文本,而是被删减为正好等于 140 个字符,其余文本被 link 替换为推文上的推文网页。

以我找到的推文为例:

require(twitteR)
require(ROAuth)

# authorize twitter with consmuer and access key/secret
setup_twitter_oauth(AAA, BBB, CCC, DDD)   # actual secret codes go here...

# get sample tweet
tweet <- searchTwitter("When I was driving around earlier this afternoon I only saw two Hunters",
                       n=500,
                       since = "2017-11-04",
                       until = "2017-11-05",
                       retryOnRateLimit=5000)

# print tweet
tweet[[1]]
[1] "_TooCrazyFox_: When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"
# the *SHORTENEDURL* is actually a link that brings you to the tweet; Whosebug didn't want me to a put shortened urls in here

# convert to data frame
df <- twListToDF(tweet)

# output text and ID
df$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"

df$id
[1] "926943636641763328"

如果我 go to this tweet via my web browser,很明显 twitteR 将文本缩短为 140 个字符,并在包含整个文本的推文中加入 link。

我在 twitteR 文档中没有看到任何提及。有什么方法可以在搜索过程中保留整个推文文本吗?

我的假设是,这与此处引用的 Twitter 字符长度的变化有关:https://developer.twitter.com/en/docs/tweets/tweet-updates(在 'Compatibility mode JSON rendering' 中)。这意味着我需要检索 full_text 字段,而不是 text 字段。但是,这似乎不是由 twitteR.

提供的

twitteR package is in process of being deprecated. You should use rtweet代替。

您可以下载 rtweet from CRAN,但目前我建议从 Github 下载开发版。默认情况下,开发版本将 return 推文的全文。它还将 return 转推或引用状态的完整原始文本。

要从 Github 安装最新版本的 rtweet,请使用 devtools 软件包。

## install newest version of rtweet
if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}
devtools::install_github("mkearney/rtweet")

安装完成后,加载 rtweet 包。

## load rtweet
library(rtweet)

rtweet 有专门的 package documentation website. It includes a vignette on obtaining and using Twitter API access tokens。如果您按照小插图中的步骤操作,您只需[每台机器]完成一次授权过程。

要搜索推文,请使用 search_tweets() 功能。

# get sample tweet
rt <- search_tweets(
  "When I was driving around earlier this afternoon I only saw two Hunters",
  n = 500
)

打印输出(一个tbl数据框)。

> rt
# A tibble: 1 x 42
           status_id          created_at    user_id   screen_name
               <chr>              <dttm>      <chr>         <chr>
1 926943636641763328 2017-11-04 22:45:59 3652909394 _TooCrazyFox_
# ... with 38 more variables: text <chr>, source <chr>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, hashtags <list>, symbols <list>,
#   urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
#   media_url <list>, media_t.co <list>, media_expanded_url <list>,
#   media_type <list>, ext_media_url <list>, ext_media_t.co <list>,
#   ext_media_expanded_url <list>, ext_media_type <lgl>,
#   mentions_user_id <list>, mentions_screen_name <list>, lang <chr>,
#   quoted_status_id <chr>, quoted_text <chr>, retweet_status_id <chr>,
#   retweet_text <chr>, place_url <chr>, place_name <chr>,
#   place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>,
#   geo_coords <list>, coords_coords <list>, bbox_coords <list>

打印推文文本(全文)。

> rt$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"

要按 ID 查找 Twitter 状态,请使用 lookup_statuses() 函数。

## lookup tweet
tweet <- lookup_statuses("926943636641763328")

打印推文文本。

> tweet$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"