如何根据组成群体的个体对群体进行分类？

Question

这是我的问题：我有一个个人数据库（每行 1 个个人）。每个人都属于一个家庭（由变量 ID_household 表示）并且具有一定的年龄（变量 age）。我想要做的是创建一个新列 type，根据组成同一家庭的个人的构成来定义家庭类型：

如果有2个大人（两个18岁以上的人），type变量取值“couple”；
如果有 1 名成人和至少 1 名未成年人且最小年龄相差 15 岁 = "单身 parent 家庭" ;
如果有 2 名成人和至少 1 名未成年人且最小年龄相差 15 岁 = "couple with children" ;
如果有一个人=“单身”。

这是导入数据的脚本。 ID_household 和 age 是原始列。 type 是我想创建的栏目，但我不知道该怎么做：

data <- data.frame(ID_household = c(1, 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 8, 8, 8, 9, 9, 10, 11, 11, 11, 11),
           age = c(31, 29, 36, 24, 34, 42, 19, 39, 6, 9, 42, 4, 6, 29, 34, 41, 12, 51, 26, 27, 1, 3),
           type = c("couple", "couple", "single person", "couple", "couple", "single person", "single person",
                    "single parent family", "single parent family", "single parent family", "single person",
                    "couple with children", "couple with children", "couple with children", "couple with children", 
                    "single parent family", "single parent family", "single person", "couple with children",
                    "couple with children", "couple with children", "couple with children"))

data
   ID_household age                 type
1             1  31               couple
2             1  29               couple
3             2  36        single person
4             3  24               couple
5             3  34               couple
6             4  42        single person
7             5  19        single person
8             6  39 single parent family
9             6   6 single parent family
10            6   9 single parent family
11            7  42        single person
12            8   4 couple with children
13            8   6 couple with children
14            8  29 couple with children
15            8  34 couple with children
16            9  41 single parent family
17            9  12 single parent family
18           10  51        single person
19           11  26 couple with children
20           11  27 couple with children
21           11   1 couple with children
22           11   3 couple with children

Answer 1

我会通过创建关于儿童、成人和年龄差异的变量并使用 case_when() 来做到这一点。在下面的代码中，我将 type2 与数据集中的 type 变量进行比较：

data <- data %>% 
  group_by(ID_household) %>% 
  mutate(n_adult = sum(age > 18), 
         n_kids = sum(age <= 18),
         min_adult_age  = min(age[which(age > 18)]), 
         max_kid_age = ifelse(n_kids > 0, max(age[which(age <= 18)]), 0),  
         age_diff = min_adult_age - max_kid_age, 
         type2 = case_when(
            n_adult == 2 & n_kids > 0 & age_diff >= 15 ~ "couple with children", 
            n_adult == 1 & n_kids > 0 & age_diff >= 15 ~ "single parent family", 
            n_adult == 2 & n_kids == 0 ~ "couple",
            n_adult == 1 & n_kids == 0 ~ "single person", 
            TRUE ~ NA_character_)) %>% 
  select(-(n_adult:age_diff))

all(data$type == data$type2)           
#[1] TRUE

Answer 2

这是 ave 的基本 R 方式。

type <- with(data, ave(age, ID_household, FUN = \(x){
  if(length(x) < 2) {
    "single person"
  } else if(length(x) == 2L && all(x >= 18)) {
    "couple"
  } else if(sum(x >= 18) == 1){
    "single parent family"
  } else "couple with children"
}))

identical(data$type, type)
#[1] TRUE

如何根据组成群体的个体对群体进行分类？

how to classify groups based on the individuals who compose them?

r

classification