获取 hist() R 中直方图的密度 bin 值索引

Question

我想在通过 hist()

生成的直方图中获取 bin 的索引值

示例和详细信息如下：

testhist <- hist(rnorm(1000, 1000, 100), n = 5000, xlim = c(0,5000), probability = TRUE)

给出 testhist$density，这是我的 'y' 值。因此，在我定义 n = 5000 的代码中，x 0:5000 有 5000 个 bin。我想获取每个 'y' 值对应的直方图 bin 的索引值。

即：

Bin Index  |  'y' value
1           0
1           0.000005
1           0
1           0
1           0.0000001
2           0.00002
3           0
3           0.0002
...5000

感谢任何帮助。

编辑：正如评论者所指出的，n= 是一个近似值。那么，让我们这样做：

testhist <- hist(rnorm(1000, 1000, 100), breaks = seq(0,5000, by = 5), xlim = c(0,5000), probability = TRUE)

现在，您将有 1000 个确切的 bin。如何获取对应于 'y' 值的 bin 的索引。即 bin 1，其范围为 0:5，其中包含哪些 y 值？

编辑 2：每个 bin 将对应一个 density，bin 的数量越多，数据越具有代表性。感谢您引导我走向正确的方向。

Answer 1

关于 hist 在这里做什么或不做什么有点混乱。

hist 没有 n= 个参数，只有 breaks=。我认为它偶然给出了相同的结果，因为 pretty() 使用 n= 并且该函数用于定义 bins。
设置 breaks=5000 不能保证 5000 个箱子，正如@Onyambu 指出的那样，由于断点的 pretty()-ification。来自 ?hist：...该数字仅供参考；因为断点将设置为漂亮的值。
testhist$density 给出 each bin 中的密度。您可以通过以下方式验证：

set.seed(1)
x <- rnorm(1000, 1000, 100)
testhist <- hist(x, n=5000, xlim = c(0,5000), probability = TRUE)
length(testhist$mids)
#[1] 6820
length(testhist$density)
#[1] 6820
length(testhist$breaks)
#[1] 6821

6820 个箱子的中点，6820 个相应的密度，以及 6821 个中断，因为您需要 n+1 个中断来提供 n 个箱子。

原始的 1000 个数据点在这 6820 个 bin 中表示，其中许多计数和相应的密度为零。

sum(testhist$counts)
#[1] 1000
sum(testhist$counts == 0)
#[1] 5954
sum(testhist$density == 0)
#[1] 5954

如果你想知道x的哪个原始值对应哪个bin，你可以这样做：

cut(x, testhist$breaks, labels=FALSE)

获取 hist() R 中直方图的密度 bin 值索引

Get index of density bin values of histogram in hist() R

r

histogram