使用 cut() 的离散容器

Discrete bins using cut()

我想在我的数据帧 age.model 中绘制数据 [使用 lattice 的 xyplot()],基于 StartAge 列的离散容器。

我正在使用以下代码:

# set up boundaries for intervals/bins
breaks <- c(0,3,4,5,6,8,13,15,17,18,19,20,22)
# specify interval/bin labels
labels <- c("<3", "3-4)", "4-5)","5-6)", "6-8)","8-13)", "13-15)","15-17)","17-18)","18-19)","19-20)",">=20")
# bucketing data points into bins
bins <- cut(age.model$StartAge, breaks, include.lowest = T, right=FALSE, labels=labels)
# inspect bins
summary(bins)

在 cut() 的第一个参数中,我指定了要离散化的列。但是,返回的因子不包括整个 DF。我该怎么做?

可使用 dput 重现:

structure(list(Height = c(0.207224416925809, -1.19429150954007, 
0.0247585682642494, 0.023546515879641, 1.51423735121426, -1.09376538778425, 
-0.125209484617016, -0.63639210765747, 0.305071992864995, -0.422021082477656
), Weight = c(-0.366133564723644, -1.06969961340686, -0.0793604259237282, 
-0.708230200986797, 1.71593234004357, -0.685215310472794, -1.20353653394014, 
-0.490399232488568, 0.742874184424376, -0.331519044995803), Training = c(19, 
27, 27, 24, 35, 23, 15, 14, 47, 7), StartAge = c(13, 19, 20, 
20, 14, 2, 8, 4, 17, 18)), row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10"), class = "data.frame")

要将 bin 添加到数据框,只需在新列中影响它即可:

age.model$bins <- bins

如果您使用 xyplot 探索 您的数据,请考虑在您的代码中使用 equal.count()shingle()。对您的数据(毫无头绪)感兴趣,体重和身高之间的近似线性关系似乎不适用于较低的 StartAge 分箱,如第一个示例所示。

# Starting with data in age.model
  library(lattice)
  xyplot(Weight ~ Height | equal.count(StartAge), age.model, type = c("p", "r"))

equal.count 的默认 bin 数为 6。可以轻松更改以探索其他分组:

# Create four groups of equal counts to explore
  xyplot(Weight ~ Height | equal.count(StartAge, 4), age.model, type = c("p", "r"))

shingle() 函数允许重叠 bin,如此处所示。

# Create three groups that overlapping with each other 
  bins <- cbind(lower = c(0,8,16), upper = c(13,18,24))
  xyplot(Weight ~ Height | shingle(StartAge, bins), age.model, type = c("p", "r"))