R 将数据集划分为范围箱?

R dividing dataset into ranged bins?

我在根据数据值的数值将我的数据集分类到容器中时遇到了一些问题。我尝试使用 lattice 中的函数 shingle 来完成它,它似乎可以准确地拆分它。

我似乎无法提取所需的输出,即如何将数据划分到预定义的 bin 中的知识。我似乎只能打印它。

bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
                        0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
                        ncol = 2, nrow =  16)
bin_1 = shingle(data_1,intervals = bin_interval)

我如何提取 shingle 函数输出的间隔,而不只是打印它...

间隔是输出:

Intervals:
    min  max count
1  0.38 0.40     0
2  0.42 0.44     6
3  0.46 0.48    46
4  0.50 0.52   251
5  0.54 0.56   697
6  0.58 0.60  1062
7  0.62 0.64  1215
8  0.66 0.68  1227
9  0.70 0.72  1231
10 0.74 0.76  1293
11 0.78 0.80  1330
12 0.82 0.84  1739
13 0.86 0.88  2454
14 0.90 0.92  3048
15 0.94 0.96  8936
16 0.98 1.00 71446

作为一个变量,可以提供给另一个函数。

shingle() 函数 returns 使用 attributes() 的值。

等级由attr(bin_1,"levels")具体给出。

所以:

set.seed(1337)
data_1 = runif(100)

bin_interval = matrix(c(0.38,0.42,0.46,0.50,0.54,0.58,0.62,0.66,0.70,0.74,0.78,0.82,0.86,0.90,0.94,0.98,
                        0.40,0.44,0.48,0.52,0.56,0.60,0.64,0.68,0.72,0.76,0.80,0.84,0.88,0.92,0.96,1.0),
                        ncol = 2, nrow =  16)
bin_1 = shingle(data_1,intervals = bin_interval)

attr(bin_1,"levels")

这给出:

      [,1] [,2]
 [1,] 0.38 0.40
 [2,] 0.42 0.44
 [3,] 0.46 0.48
 [4,] 0.50 0.52
 [5,] 0.54 0.56
 [6,] 0.58 0.60
 [7,] 0.62 0.64
 [8,] 0.66 0.68
 [9,] 0.70 0.72
[10,] 0.74 0.76
[11,] 0.78 0.80
[12,] 0.82 0.84
[13,] 0.86 0.88
[14,] 0.90 0.92
[15,] 0.94 0.96
[16,] 0.98 1.00

编辑

每个间隔的计数信息仅在 print.shingle 方法中计算。因此,您需要 运行 以下代码:

count.shingle = function(x){
  l <- levels(x)
  n <- nlevels(x)
  int <- data.frame(min = numeric(n), max = numeric(n), 
                    count = numeric(n))
  for (i in 1:n) {
    int$min[i] <- l[[i]][1]
    int$max[i] <- l[[i]][2]
    int$count[i] <- length(x[x >= l[[i]][1] & x <= l[[i]][2]])
  }

  int
}

a = count.shingle(bin_1)

这给出:

> a 
   min  max count
1  0.38 0.40     0
2  0.42 0.44     1
3  0.46 0.48     3
4  0.50 0.52     1
5  0.54 0.56     2
6  0.58 0.60     2
7  0.62 0.64     2
8  0.66 0.68     4
9  0.70 0.72     1
10 0.74 0.76     3
11 0.78 0.80     2
12 0.82 0.84     2
13 0.86 0.88     5
14 0.90 0.92     1
15 0.94 0.96     1
16 0.98 1.00     2

其中 a$min 是下限,a$max 是上限,a$count 是 bin 内的数字。