如何分解数字变量?
How to factorize a numeric variable?
我想将数字变量家庭收入分解为 3 个不同的类别:低、中和高。
所有 3 个收入组均由单身家庭与非单身家庭决定:
low middle high
1. Single houshold 860 861 – 1844 >1845
2. Non Single houshold 1900 979 – 4242 >4242
感兴趣的变量是个人 ID (pid)、家庭 ID (hid)。例如
year pid hid household income
1990 201 1 1000
1991 201 1 1000
1992 201 1 2000
1990 202 1 2000
1991 202 1 3000
1992 202 1 4000
1990 3000 2 5000
1991 3000 2 ..
1992 3000 2
1990 1000 3
1991 1000 3
1992 1000 3
我想判断是不是单身家庭,加上相应的收入组别。我想创建一个空向量 "Income":
data_s1<- within(data,{
Income<-NA
Income[income <900 & single household ]<-low
Income[income<1900 & nonsingle household]<-low
Income[income %in% 861:1844 & single household]<-middle
Income[income %in% 979:4242 & nonsingle household ]<-middle
Income[income >1845 & single household ]<-high
Income[income >4242 & nonsingle household ]<-high
})
所以我在实现这个逻辑结构时遇到了一些问题。
您可以尝试以下方法:
# define the cutoffs per group
single <- c(0, 860, 1844, Inf)
nonsingle <- c(0, 1900, 4242, Inf)
# define the group labels
l <- c("low", "middle", "high")
# check if household has exactly 1 pid (==singlehousehold)
df$singlehousehold <- with(df, ave(pid, hid, FUN = function(x) length(unique(x)) == 1L))
# split the data according to singlehousehold and cut the income into groups. Then rbind back together
df <- do.call(rbind, lapply(split(df, df$singlehousehold), function(x) {
if (x$singlehousehold[1]) {
x$incomeclass <- cut(x[, "household income"], single, labels = l)
x
} else {
x$incomeclass <- cut(x[, "household income"], nonsingle, labels = l)
x
}
}
))
rownames(df) <- NULL # to reset the row names
我想将数字变量家庭收入分解为 3 个不同的类别:低、中和高。
所有 3 个收入组均由单身家庭与非单身家庭决定:
low middle high
1. Single houshold 860 861 – 1844 >1845
2. Non Single houshold 1900 979 – 4242 >4242
感兴趣的变量是个人 ID (pid)、家庭 ID (hid)。例如
year pid hid household income
1990 201 1 1000
1991 201 1 1000
1992 201 1 2000
1990 202 1 2000
1991 202 1 3000
1992 202 1 4000
1990 3000 2 5000
1991 3000 2 ..
1992 3000 2
1990 1000 3
1991 1000 3
1992 1000 3
我想判断是不是单身家庭,加上相应的收入组别。我想创建一个空向量 "Income":
data_s1<- within(data,{
Income<-NA
Income[income <900 & single household ]<-low
Income[income<1900 & nonsingle household]<-low
Income[income %in% 861:1844 & single household]<-middle
Income[income %in% 979:4242 & nonsingle household ]<-middle
Income[income >1845 & single household ]<-high
Income[income >4242 & nonsingle household ]<-high
})
所以我在实现这个逻辑结构时遇到了一些问题。
您可以尝试以下方法:
# define the cutoffs per group
single <- c(0, 860, 1844, Inf)
nonsingle <- c(0, 1900, 4242, Inf)
# define the group labels
l <- c("low", "middle", "high")
# check if household has exactly 1 pid (==singlehousehold)
df$singlehousehold <- with(df, ave(pid, hid, FUN = function(x) length(unique(x)) == 1L))
# split the data according to singlehousehold and cut the income into groups. Then rbind back together
df <- do.call(rbind, lapply(split(df, df$singlehousehold), function(x) {
if (x$singlehousehold[1]) {
x$incomeclass <- cut(x[, "household income"], single, labels = l)
x
} else {
x$incomeclass <- cut(x[, "household income"], nonsingle, labels = l)
x
}
}
))
rownames(df) <- NULL # to reset the row names