一列中有多少值出现在另一个 R 中

Question

我正在处理调查数据，我想计算一个家庭中每个人有多少 children。数据集如下https://github.com/aquijanoruiz/disability_benefits_EC/raw/master/rds_files/survey_data.rds

“人”栏列举了住户中的人。 “母亲”和“父亲”列表示一个人的 mother/father 是谁。例如，在第一个家庭（由household_id变量010150000201011表示）中，peron 1和2分别是peron 3、4、5和6的母亲和父亲。

我想生成一个变量来告诉每个家庭成员 children 的数量。对于第一个家庭，它应该是 4, 4, 0, 0, 0, 0。

我在考虑 children %>% group_by(household_id) %>% mutate(n_chid = sum(person %in% mother, na.rm = TRUE)) 但它们行不通。有任何想法吗？谢谢！

Answer 1

我们可能需要遍历每个 'person'

library(dplyr)
library(purrr)
children %>% 
     group_by(household_id) %>%
     mutate(n_chid = map_dbl(person, ~ sum(mother %in% .x, na.rm = TRUE)))

在整个组上使用带有 person %in% mother 的 OP 代码，它 returns 相同的逻辑输出，因为它包括组的所有行。相反，我们需要一次将其限制为一个观察

一列中有多少值出现在另一个 R 中

How many values in one column appear in another R

r

survey

dplyr