R 编程 ggvis 直方图与 hist - 如何调整桶的大小,并定义 X 轴间距(刻度)

R programming ggvis histogram verses hist - How to size the buckets, and define X axis spacing (ticks)

我正在学习使用 ggvis 并想了解如何创建与 hist 生成的直方图等效的直方图。具体如何设置ggvis直方图中x的bin宽度和上下界?我错过了什么?

问题: 如何让 ggvis 直方图输出匹配 hist 输出?

举个例子:

require(psych)
require(RCurl)
require(ggvis)

if ( !exists("impact") ) {
  url <- "https://dl.dropboxusercontent.com/u/8272421/stat/stat_one.txt"
  myCsv <- getURL(url, ssl.verifypeer = FALSE)
  impact <- read.csv(textConnection(myCsv), sep = "\t")
  impact$subject <- factor(impact$subject)
}

describe(impact)

hist(impact$verbal_memory_baseline, 
     main = "Distribution of verbal memory baseline scores", 
     xlab = "score", ylab = "frequency")

好的,让我们尝试使用 ggvis 重现...输出不匹配...

impact %>%
ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
layer_histograms(width = 5) %>%
add_axis("x", title = "score") %>%
add_axis("y", title = "frequency")

如何让 ggvis 输出匹配 hist 输出?


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] psych_1.5.6      knitr_1.11       ggvis_0.4.2.9000 setwidth_1.0-4  colorout_1.1-1   vimcom_1.2-3    

loaded via a namespace (and not attached):
[1] Rcpp_0.12.0          digest_0.6.8         dplyr_0.4.3.9000     assertthat_0.1       mime_0.3            
[6] R6_2.1.1             jsonlite_0.9.16      xtable_1.7-4         DBI_0.3.1            magrittr_1.5        
[11] lazyeval_0.1.10.9000 rstudioapi_0.3.1     rmarkdown_0.7        tools_3.2.2          shiny_0.12.2        
[16] httpuv_1.3.3         yaml_2.1.13          parallel_3.2.2       rsconnect_0.4.1.4    mnormt_1.5-3        
[21] htmltools_0.2.6

尝试

impact %>%
  ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
  layer_histograms(width = 5, boundary = 5) %>% 
  add_axis("y", title = "frequency") %>%
  add_axis("x", title = "score", ticks = 5)

给出:


官方文档对 boundarycenter 的工作原理有点含糊。看看 DataCamp 的 How to Make a Histogram with ggvis in R

The width argument already set the bin width to 5, but where do bins start and where do they end? You can use the center or boundary argument for this. center should refer to one of the bins’ center value, which automatically determines the other bins location. The boundary argument specifies the boundary value of one of the bins. Here again, specifying a single value fixes the location of all bins. As these two arguments specify the same thing in a different way, you should set at most one of center or boundary.


如果您想要使用 center 而不是 boundary 的相同结果,请尝试:

impact %>%
  ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
  layer_histograms(width = 5, center = 77.5) %>% 
  add_axis("y", title = "frequency") %>%
  add_axis("x", title = "score", ticks = 5)

在这里您指定一个 bin 的中心 (77.5),它会自动确定所有其他的

史蒂文斯的回答是正确的。

有了他的指点,我可以更深入地阅读文档:

layer_histograms():

http://www.rdocumentation.org/packages/ggvis/functions/layer_histograms

边界

  • 两个垃圾箱之间的边界。与中心一样,当 边界在数据范围之外。例如,以 整数,使用 width = 1 和 boundary = 0.5,即使 1 在 数据的范围。中心和边界至多一个可以是 指定。

add_axis()

http://www.rdocumentation.org/packages/ggvis/functions/add_axis

刻度

  • 所需的刻度数。结果数字可能不同,所以 该值是 "nice"(2、5、10 的倍数)并且位于 基础规模的范围。