根据年龄和成员id创建户主
Create household head based on age and member id
我有一个包含 3 个整数列的家庭成员数据框,'hid'、'sub' 和 'age'。我想在名为 'hh' 的数据框中创建一个新的逻辑变量,代表户主,定义如下:
- 如果家庭中只有 1 名成员,则值为 TRUE,
- 如果家庭中有 2 名或更多成员,则户主为年龄在 18 至 65 岁(含)之间且在 18 至 65 岁(含)之间且 subject id 最小的人('sub') 18 岁和 65 岁。
- 如果家庭中没有18-65岁的成员,则户主是subject id最小的人。
每户必须有 1 位户主。
我的数据看起来像这样:
# A tibble: 10 x 3
hid sub age
<dbl> <dbl> <dbl>
1 1 1 75
2 1 2 55
3 2 1 35
4 3 1 69
5 3 2 72
6 4 1 69
7 5 1 15
8 5 2 17
9 5 3 42
10 6 1 72
我希望结果是这样的:
> result
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE # Not 18-65 & there is another aged 18-65 within this household.
2 1 2 55 TRUE # Aged 18-65 and the smallest sub id within this household.
3 2 1 35 TRUE # Only 1 in this household.
4 3 1 69 TRUE # Not aged 18-65, but no other member is and smallest sub id.
5 3 2 72 FALSE # Not aged 18-65, and not the smallest sub id.
6 4 1 69 TRUE # Only 1 in this household.
7 5 1 15 FALSE # Not aged 18-65 and others in this household qualify.
8 5 2 17 FALSE # Not aged 18-65 and others in this household qualify.
9 5 3 42 TRUE # Aged 18-65 and the smallest sub id among those aged 18-65 within this household.
10 5 4 62 FALSE # Aged 18-65 but not the smallest sub id among those aged 18-65 within this household.
谢谢!
d <- structure(list(hid = c(1, 1, 2, 3, 3, 4, 5, 5, 5, 5),
sub = c(1, 2, 1, 1, 2, 1, 1, 2, 3, 4),
age = c(75, 55, 35, 69, 72, 69, 15, 17, 42, 62)),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
这里有一个选项
library(dplyr)
d %>%
group_by(hid) %>%
mutate(hh = if(n() == 1) TRUE else if(n() > 1 &
!any(between(age, 18, 65))) age == min(age) else
age == min(age[between(age, 18, 65)])) %>%
ungroup
-输出
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE
或者另一个简化的选项是
d %>%
mutate(rn = row_number()) %>%
arrange(hid, sub, age) %>%
group_by(hid) %>%
mutate(hh = age == coalesce(age[between(age, 18, 65)][1],
first(age))) %>%
ungroup %>%
arrange(rn) %>%
select(-rn)
-输出
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE
您可以 arrange
数据,使每组的第一行是您要查找的 hh
值。
library(dplyr)
d %>%
arrange(hid, !between(age, 18, 65), sub) %>%
mutate(hh = !duplicated(hid))
# hid sub age hh
# <dbl> <dbl> <dbl> <lgl>
# 1 1 2 55 TRUE
# 2 1 1 75 FALSE
# 3 2 1 35 TRUE
# 4 3 1 69 TRUE
# 5 3 2 72 FALSE
# 6 4 1 69 TRUE
# 7 5 3 42 TRUE
# 8 5 4 62 FALSE
# 9 5 1 15 FALSE
#10 5 2 17 FALSE
!between(age, 18, 65)
会安排数据,将 18-65 岁的人放在第一位,然后再排在范围外的其他人之前。
带有case_when
的选项,
每个 case_when 正在将您的条件 1 到 3 翻译成代码:
library(dplyr)
d %>%
group_by(hid) %>%
mutate(hh = case_when(max(sub) == 1 ~ TRUE,
max(sub) > 1 &
between(age, 18, 65) &
sub == min(sub[between(age, 18, 65)]) ~ TRUE,
max(between(age, 18, 65)) < 1 &
sub == min(sub[max(between(age, 18, 65)) < 1]) ~ TRUE,
TRUE ~ FALSE))
输出:
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE
我有一个包含 3 个整数列的家庭成员数据框,'hid'、'sub' 和 'age'。我想在名为 'hh' 的数据框中创建一个新的逻辑变量,代表户主,定义如下:
- 如果家庭中只有 1 名成员,则值为 TRUE,
- 如果家庭中有 2 名或更多成员,则户主为年龄在 18 至 65 岁(含)之间且在 18 至 65 岁(含)之间且 subject id 最小的人('sub') 18 岁和 65 岁。
- 如果家庭中没有18-65岁的成员,则户主是subject id最小的人。
每户必须有 1 位户主。
我的数据看起来像这样:
# A tibble: 10 x 3
hid sub age
<dbl> <dbl> <dbl>
1 1 1 75
2 1 2 55
3 2 1 35
4 3 1 69
5 3 2 72
6 4 1 69
7 5 1 15
8 5 2 17
9 5 3 42
10 6 1 72
我希望结果是这样的:
> result
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE # Not 18-65 & there is another aged 18-65 within this household.
2 1 2 55 TRUE # Aged 18-65 and the smallest sub id within this household.
3 2 1 35 TRUE # Only 1 in this household.
4 3 1 69 TRUE # Not aged 18-65, but no other member is and smallest sub id.
5 3 2 72 FALSE # Not aged 18-65, and not the smallest sub id.
6 4 1 69 TRUE # Only 1 in this household.
7 5 1 15 FALSE # Not aged 18-65 and others in this household qualify.
8 5 2 17 FALSE # Not aged 18-65 and others in this household qualify.
9 5 3 42 TRUE # Aged 18-65 and the smallest sub id among those aged 18-65 within this household.
10 5 4 62 FALSE # Aged 18-65 but not the smallest sub id among those aged 18-65 within this household.
谢谢!
d <- structure(list(hid = c(1, 1, 2, 3, 3, 4, 5, 5, 5, 5),
sub = c(1, 2, 1, 1, 2, 1, 1, 2, 3, 4),
age = c(75, 55, 35, 69, 72, 69, 15, 17, 42, 62)),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
这里有一个选项
library(dplyr)
d %>%
group_by(hid) %>%
mutate(hh = if(n() == 1) TRUE else if(n() > 1 &
!any(between(age, 18, 65))) age == min(age) else
age == min(age[between(age, 18, 65)])) %>%
ungroup
-输出
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE
或者另一个简化的选项是
d %>%
mutate(rn = row_number()) %>%
arrange(hid, sub, age) %>%
group_by(hid) %>%
mutate(hh = age == coalesce(age[between(age, 18, 65)][1],
first(age))) %>%
ungroup %>%
arrange(rn) %>%
select(-rn)
-输出
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE
您可以 arrange
数据,使每组的第一行是您要查找的 hh
值。
library(dplyr)
d %>%
arrange(hid, !between(age, 18, 65), sub) %>%
mutate(hh = !duplicated(hid))
# hid sub age hh
# <dbl> <dbl> <dbl> <lgl>
# 1 1 2 55 TRUE
# 2 1 1 75 FALSE
# 3 2 1 35 TRUE
# 4 3 1 69 TRUE
# 5 3 2 72 FALSE
# 6 4 1 69 TRUE
# 7 5 3 42 TRUE
# 8 5 4 62 FALSE
# 9 5 1 15 FALSE
#10 5 2 17 FALSE
!between(age, 18, 65)
会安排数据,将 18-65 岁的人放在第一位,然后再排在范围外的其他人之前。
带有case_when
的选项,
每个 case_when 正在将您的条件 1 到 3 翻译成代码:
library(dplyr)
d %>%
group_by(hid) %>%
mutate(hh = case_when(max(sub) == 1 ~ TRUE,
max(sub) > 1 &
between(age, 18, 65) &
sub == min(sub[between(age, 18, 65)]) ~ TRUE,
max(between(age, 18, 65)) < 1 &
sub == min(sub[max(between(age, 18, 65)) < 1]) ~ TRUE,
TRUE ~ FALSE))
输出:
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE
2 1 2 55 TRUE
3 2 1 35 TRUE
4 3 1 69 TRUE
5 3 2 72 FALSE
6 4 1 69 TRUE
7 5 1 15 FALSE
8 5 2 17 FALSE
9 5 3 42 TRUE
10 5 4 62 FALSE