如何分解数字变量?

How to factorize a numeric variable?

我想将数字变量家庭收入分解为 3 个不同的类别:低、中和高。

所有 3 个收入组均由单身家庭与非单身家庭决定:

                             low            middle             high
  1. Single houshold      860             861 – 1844           >1845 
  2. Non Single houshold  1900            979 – 4242           >4242

感兴趣的变量是个人 ID (pid)、家庭 ID (hid)。例如

         year    pid                hid               household income
         1990     201                 1                 1000
         1991     201                 1                 1000
         1992     201                 1                 2000
         1990     202                 1                 2000
         1991     202                 1                 3000
         1992     202                 1                 4000  
         1990     3000                2                 5000
         1991     3000                2                  ..
         1992     3000                2
         1990     1000                3
         1991     1000                3
         1992     1000                3

我想判断是不是单身家庭,加上相应的收入组别。我想创建一个空向量 "Income":

            data_s1<- within(data,{
                           Income<-NA
                             Income[income <900 & single household ]<-low
                             Income[income<1900 & nonsingle household]<-low
                             Income[income %in%  861:1844  & single household]<-middle
                             Income[income %in%  979:4242 & nonsingle household ]<-middle
                             Income[income >1845 & single household  ]<-high
                             Income[income >4242 & nonsingle household  ]<-high
})

所以我在实现这个逻辑结构时遇到了一些问题。

您可以尝试以下方法:

# define the cutoffs per group
single <- c(0, 860, 1844, Inf) 
nonsingle <- c(0, 1900, 4242, Inf)
# define the group labels 
l <- c("low", "middle", "high") 
# check if household has exactly 1 pid (==singlehousehold)
df$singlehousehold <- with(df, ave(pid, hid, FUN = function(x) length(unique(x)) == 1L))
# split the data according to singlehousehold and cut the income into groups. Then rbind back together
df <- do.call(rbind, lapply(split(df, df$singlehousehold), function(x) { 
  if (x$singlehousehold[1]) {
    x$incomeclass <- cut(x[, "household income"], single, labels = l)
    x 
  } else {
      x$incomeclass <- cut(x[, "household income"], nonsingle, labels = l)
      x
    }
  }
))
rownames(df) <- NULL   # to reset the row names