R - 创建一个变量来记录组中其他成员的分类信息

Question

我有一个家庭数据集 data，每个家庭都由变量 id 标识，每个人都用 id + num 标识（家庭 ID + 家庭成员）。对于每个人，我都有不同的人口统计特征，例如：

id  num  age  wage     edu            marital_status
1   1    33   1200    Secondary       Married/Cohabitating
1   2    35   1100    College         Married/Cohabitating
1   3    12   -1      Not applicable  Not applicable
2   1    27   1600    College         Single
3   1    59   2000    Secondary       Married/Cohabitating
3   2    51   1800    Other           Married/Cohabitating

我创建了一组变量来记录另一个家庭成员的特征。因此，例如，我想为有两个已婚或同居成年人的家庭设置一个 'wage of partner' wage_p 的变量，这是我通过

获得的

sums = tapply(data$wage, data$id, sum)
data$wage_tot = sums[match(data$id,names(sums))]
data$wage_tot[!(data$id %in% data$id[duplicated(data$id)])] = NA
data$wage_p = data$wage_tot - data$wage

基本上，我每户求和wage得到wage_tot，然后减去wage得到wage_p.

之所以有效，是因为我首先将数据集限制为已婚或同居的个人（因此我每个家庭有 1 或 2 个人）。（我知道这可能比必要的更复杂）。

我的结果：

id  num  age  wage     edu            marital_status        wage_tot   wage_p
1   1    33   1200    Secondary       Married/Cohabitating   2300      1100
1   2    35   1100    College         Married/Cohabitating   2300      1200
2   1    27   1600    College         Single                 NA         NA
3   1    59   2000    Secondary       Married/Cohabitating   3800      1800
3   2    51   1800    Other           Married/Cohabitating   3800      2000

现在，当我想对分类变量执行此操作时，问题就来了，因为我无法像对连续变量那样获得总数然后减去。因此，例如，如果我想创建一个记录配偶受教育程度的变量，edu_p。

id  num  age  wage     edu            marital_status         edu_p
1   1    33   1200    Secondary       Married/Cohabitating   College
1   2    35   1100    College         Married/Cohabitating   Secondary
2   1    27   1600    College         Single                 NA
3   1    59   2000    Secondary       Married/Cohabitating   Other
3   2    51   1800    Other           Married/Cohabitating   Secondary

我能想到的唯一想法是将分类变量转换为数值，使用我的方法，然后再次转换它们，但我确信它比必须的要复杂得多。

谁能帮帮我？

Answer 1

考虑一个 merge 解决方案，使用 id 将每对夫妇相互比较。最终左连接 merge 用于包括来自原始数据的非耦合观察。

spouse_merge <- subset(merge(data, data, by="id", suffixes=c("", "_p")),
                       (num < num_p | num > num_p) & 
                       marital_status != "Not applicable" &
                       marital_status_p != "Not applicable")

final_df <- merge(data, spouse_merge[c(1,2, grep("_p", names(spouse_merge)))], 
                  by=c("id", "num"), all.x=TRUE)
final_df

Online Demo

R - 创建一个变量来记录组中其他成员的分类信息

R - creating a variable that records categorical info of other member of a group

r

dataframe

tapply

categorical-data