通过分箱获取包含两个连续变量的数据帧的 2D table (6x6)
Get 2D table (6x6) for dataframe containing two continuous variables, by binning
我正在尝试根据两个连续变量将数据框中的观察结果分成 36 组。更具体地说,我试图将两个变量中的每一个分成六组,然后将观察结果分组到 36 个不同的可能组之一。
我的尝试如下,有效。但是有没有更快的方法来避免双重 for 循环?
此外,这不是必需的,但我如何在 6 x 6 的网格中可视化每组中的观察总数?我知道 table() 会生成 36 个可能的组及其总数的列表,但不是网格格式。
set.seed(123)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
data <- data.frame(x1,x2)
labs1 <- levels(cut(x1, 6))
ints1 <- cbind(lower = as.numeric(sub("\((.+),.*", "\1", labs1)),
upper = as.numeric(sub("[^,]*,([^]]*)\]", "\1", labs1)))
labs2 <- levels(cut(x2, 6))
ints2 <- cbind(lower = as.numeric(sub("\((.+),.*", "\1", labs2)),
upper = as.numeric(sub("[^,]*,([^]]*)\]", "\1", labs2)))
tmp <- expand.grid(labs1, labs2)
groups <- cbind(lower1 = as.numeric(sub("\((.+),.*", "\1", tmp[,1])),
upper1 = as.numeric(sub("[^,]*,([^]]*)\]", "\1", tmp[,1])),
lower2 = as.numeric(sub("\((.+),.*", "\1", tmp[,2])),
upper2 = as.numeric(sub("[^,]*,([^]]*)\]", "\1", tmp[,2])))
for (i in 1:1000){
for (j in 1:36){
if (x1[i] >= groups[j,1] & x1[i] <= groups[j,2] &
x2[i] >= groups[j,3] & x2[i] <= groups[j,4]){
data$group[i] <- j
}
}
}
您可以混合使用 apply()
来遍历您的 data.frame
和 which()
来遍历您的组 array
:
data$group <- apply(data, 1, FUN=function(dataRow)
which(
dataRow[1] >= groups[,1] &
dataRow[1] <= groups[,2] &
dataRow[2] >= groups[,3] &
dataRow[2] <= groups[,4]))
你想多了。获取 6x6 表格是 table()
的单行。 (直接使用 cut(..., 6)
创建的有用因子变量,不要只是丢弃因子然后手动重新应用其水平并将变量装箱):
with(data, table(cut(x1, 6), cut(x2, 6)))
(-3.05,-1.97] (-1.97,-0.902] (-0.902,0.171] (0.171,1.24] (1.24,2.32] (2.32,3.4]
(-2.82,-1.8] 2 10 11 7 3 0
(-1.8,-0.793] 1 26 67 49 19 3
(-0.793,0.216] 12 57 140 146 31 3
(0.216,1.22] 11 49 109 95 36 6
(1.22,2.23] 0 10 31 34 15 0
(2.23,3.25] 0 3 5 6 2 1
# and to get the wide lines, you may need...
options('width'=199)
# or if you want more compact labels to keep it all narrow, use `cut(..., dig.lab)`
with(data, table(cut(x1, 6, dig.lab=2), cut(x2, 6, dig.lab=2)))
(-3.1,-2] (-2,-0.9] (-0.9,0.17] (0.17,1.2] (1.2,2.3] (2.3,3.4]
(-2.8,-1.8] 2 10 11 7 3 0
(-1.8,-0.79] 1 26 67 49 19 3
(-0.79,0.22] 12 57 140 146 31 3
(0.22,1.2] 11 49 109 95 36 6
(1.2,2.2] 0 10 31 34 15 0
(2.2,3.2] 0 3 5 6 2 1
诚然,table()
和 cut()
的文档都没有直接说明,可以使用这样的二维示例。 => Doc/Enhance-bug
我正在尝试根据两个连续变量将数据框中的观察结果分成 36 组。更具体地说,我试图将两个变量中的每一个分成六组,然后将观察结果分组到 36 个不同的可能组之一。
我的尝试如下,有效。但是有没有更快的方法来避免双重 for 循环?
此外,这不是必需的,但我如何在 6 x 6 的网格中可视化每组中的观察总数?我知道 table() 会生成 36 个可能的组及其总数的列表,但不是网格格式。
set.seed(123)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
data <- data.frame(x1,x2)
labs1 <- levels(cut(x1, 6))
ints1 <- cbind(lower = as.numeric(sub("\((.+),.*", "\1", labs1)),
upper = as.numeric(sub("[^,]*,([^]]*)\]", "\1", labs1)))
labs2 <- levels(cut(x2, 6))
ints2 <- cbind(lower = as.numeric(sub("\((.+),.*", "\1", labs2)),
upper = as.numeric(sub("[^,]*,([^]]*)\]", "\1", labs2)))
tmp <- expand.grid(labs1, labs2)
groups <- cbind(lower1 = as.numeric(sub("\((.+),.*", "\1", tmp[,1])),
upper1 = as.numeric(sub("[^,]*,([^]]*)\]", "\1", tmp[,1])),
lower2 = as.numeric(sub("\((.+),.*", "\1", tmp[,2])),
upper2 = as.numeric(sub("[^,]*,([^]]*)\]", "\1", tmp[,2])))
for (i in 1:1000){
for (j in 1:36){
if (x1[i] >= groups[j,1] & x1[i] <= groups[j,2] &
x2[i] >= groups[j,3] & x2[i] <= groups[j,4]){
data$group[i] <- j
}
}
}
您可以混合使用 apply()
来遍历您的 data.frame
和 which()
来遍历您的组 array
:
data$group <- apply(data, 1, FUN=function(dataRow)
which(
dataRow[1] >= groups[,1] &
dataRow[1] <= groups[,2] &
dataRow[2] >= groups[,3] &
dataRow[2] <= groups[,4]))
你想多了。获取 6x6 表格是 table()
的单行。 (直接使用 cut(..., 6)
创建的有用因子变量,不要只是丢弃因子然后手动重新应用其水平并将变量装箱):
with(data, table(cut(x1, 6), cut(x2, 6)))
(-3.05,-1.97] (-1.97,-0.902] (-0.902,0.171] (0.171,1.24] (1.24,2.32] (2.32,3.4]
(-2.82,-1.8] 2 10 11 7 3 0
(-1.8,-0.793] 1 26 67 49 19 3
(-0.793,0.216] 12 57 140 146 31 3
(0.216,1.22] 11 49 109 95 36 6
(1.22,2.23] 0 10 31 34 15 0
(2.23,3.25] 0 3 5 6 2 1
# and to get the wide lines, you may need...
options('width'=199)
# or if you want more compact labels to keep it all narrow, use `cut(..., dig.lab)`
with(data, table(cut(x1, 6, dig.lab=2), cut(x2, 6, dig.lab=2)))
(-3.1,-2] (-2,-0.9] (-0.9,0.17] (0.17,1.2] (1.2,2.3] (2.3,3.4]
(-2.8,-1.8] 2 10 11 7 3 0
(-1.8,-0.79] 1 26 67 49 19 3
(-0.79,0.22] 12 57 140 146 31 3
(0.22,1.2] 11 49 109 95 36 6
(1.2,2.2] 0 10 31 34 15 0
(2.2,3.2] 0 3 5 6 2 1
诚然,table()
和 cut()
的文档都没有直接说明,可以使用这样的二维示例。 => Doc/Enhance-bug