如何使用 R 语言规范化直方图中的推文？

Question

我检索了具有不同跟踪周期的各种主题标签的 Twitter 推文。例如，hashtag1 被跟踪了 6 天，Hashtag2 被跟踪了 4 天，Hashtag3 被跟踪了 2 天。如何规范化每个主题标签？我怎样才能把它们分成相等的四分之一？提前致谢...这是代码......>

    library(streamR)
    library(rjson)

    setwd("/Users/Desktop")
    Tweets = parseTweets("Hashtag1.json")
    table(Tweets$created_at)

    dated_Tweets <- as.POSIXct(Tweets$created_at, format = "%a %b %d %H:%M:%S   
    +0000 %Y")

    hist(dated_Tweets, breaks="hours", freq=TRUE, xlab="dated_Tweets", main= 
    "Distribution of tweets", col="blue")

Answer 1

您可以使用 chron 包并通过转换为 bin 来仅处理小时数，如

中所写

Answer 2

我认为您的主要障碍是将日期时间转换为 6 小时分箱。您可以使用 format.POSIXct 和 cut 来实现。这是一个带有直方图的建议。有很多方法可以制作直方图，也许您更喜欢 table。

   library(magrittr)
   library(ggplot2)
   ## create some tweet times
   hash1 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 10*86400))
   hash2 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 31*86400))
   hash3 <- lubridate::ymd("20170101") + lubridate::seconds(runif(300, 0, 5*86400))
   ## bin these into 6h intervals
   bins1 <- format(hash1, "%H") %>%
       as.numeric() %>%
           cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
   hTags <- data.frame(tag="#1", bins=bins1)
   bins2 <- format(hash2, "%H") %>%
       as.numeric() %>%
           cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
   hTags <- rbind(hTags,
                  data.frame(tag="#2", bins=bins2 ))
   bins3 <- format(hash3, "%H") %>%
       as.numeric() %>%
           cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
   hTags <- rbind(hTags,
                  data.frame(tag="#3", bins=bins3 ))
   ggplot(data=hTags, aes(x=bins, fill=tag)) + geom_bar(position="dodge", aes(y=..prop.., group=tag))

如何使用 R 语言规范化直方图中的推文？

How to normalize tweets in Histogram using R language?

twitter

r

normalization

histogram

tweets