R - 丢弃的行
R - Dropped rows
我有一个数据集,它给出了在日常生活中帮助老人的亲属的数量。我还有另一部分调查是针对老年人的。
有些老人根本没有亲戚帮助,所以我的第一个数据集的行数比第二个少。
我想统计每个学长的亲戚给的小时数(一个学长可以被很多亲戚帮助),每个没有被帮助的学长都应该显示NA。
行从 1 到 10628。
这是我的第一个数据集:
head(relative)
id_senior id_relative hours_care
1 1 3
1 2 6
3 1 5
3 2 0
3 3 1
4 1 3
...
10628 1 34
这是我的第二个:
head(senior)
id_senior
1
2
3
4
...
10628
我想要这样的东西:
head(senior) #or whatever the name
id_senior nbr_relative sum_hours
1 2 9
2 0 NA
3 3 6
4 1 3
...
10638 1 34
我试过这样的事情:
library(dplyr)
#To count the number of relatives
nbr_relatives <- relatives %>%
group_by(id_senior = factor(id_senior, levels = min(id_senior):max(id_senior)), .drop = FALSE) %>%
summarise(relatives = n_distinct(id_relatives))
#The value 0 is given to every relatives which has no care hours value
subset_caregivers$hours_recoded[is.na(subset_caregivers$hours_recoded)] <- 0
nbr_relatives <- relative %>%
group_by(id_senior = factor(id_senior, levels = min(id_senior):max(id_senior))) %>%
count(hours = sum(hours_care), na.rm = TRUE)
但是数据集的行数nbr_relatives变成了4564,学长帮忙的数,不是10628!
我哪里错了?
在 Base-R 中,
merge(senior, aggregate(hours_care ~ id_senior, relative, sum), by = "id_senior", all.x=T)
id_senior hours_care
1 1 9
2 2 NA
3 3 6
4 4 3
5 5 NA
...
20 20 NA
编辑:
获取
中的额外列
merge(senior, merge(aggregate(id_relative ~ id_senior, relative, length),aggregate(hours_care ~ id_senior, relative, sum)), by = "id_senior", all.x=T)
id_senior id_relative hours_care
1 1 2 9
2 2 NA NA
3 3 3 6
4 4 1 3
5 5 NA NA
6 6 NA NA
...
数据:
relative <- structure(list(id_senior = c(1L, 1L, 3L, 3L, 3L, 4L), id_relative = c(1L,
2L, 1L, 2L, 3L, 1L), hours_care = c(3L, 6L, 5L, 0L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
senior <- data.frame(id_senior = 1:20)
我有一个数据集,它给出了在日常生活中帮助老人的亲属的数量。我还有另一部分调查是针对老年人的。
有些老人根本没有亲戚帮助,所以我的第一个数据集的行数比第二个少。
我想统计每个学长的亲戚给的小时数(一个学长可以被很多亲戚帮助),每个没有被帮助的学长都应该显示NA。
行从 1 到 10628。
这是我的第一个数据集:
head(relative)
id_senior id_relative hours_care
1 1 3
1 2 6
3 1 5
3 2 0
3 3 1
4 1 3
...
10628 1 34
这是我的第二个:
head(senior)
id_senior
1
2
3
4
...
10628
我想要这样的东西:
head(senior) #or whatever the name
id_senior nbr_relative sum_hours
1 2 9
2 0 NA
3 3 6
4 1 3
...
10638 1 34
我试过这样的事情:
library(dplyr)
#To count the number of relatives
nbr_relatives <- relatives %>%
group_by(id_senior = factor(id_senior, levels = min(id_senior):max(id_senior)), .drop = FALSE) %>%
summarise(relatives = n_distinct(id_relatives))
#The value 0 is given to every relatives which has no care hours value
subset_caregivers$hours_recoded[is.na(subset_caregivers$hours_recoded)] <- 0
nbr_relatives <- relative %>%
group_by(id_senior = factor(id_senior, levels = min(id_senior):max(id_senior))) %>%
count(hours = sum(hours_care), na.rm = TRUE)
但是数据集的行数nbr_relatives变成了4564,学长帮忙的数,不是10628!
我哪里错了?
在 Base-R 中,
merge(senior, aggregate(hours_care ~ id_senior, relative, sum), by = "id_senior", all.x=T)
id_senior hours_care
1 1 9
2 2 NA
3 3 6
4 4 3
5 5 NA
...
20 20 NA
编辑:
获取
中的额外列 merge(senior, merge(aggregate(id_relative ~ id_senior, relative, length),aggregate(hours_care ~ id_senior, relative, sum)), by = "id_senior", all.x=T)
id_senior id_relative hours_care
1 1 2 9
2 2 NA NA
3 3 3 6
4 4 1 3
5 5 NA NA
6 6 NA NA
...
数据:
relative <- structure(list(id_senior = c(1L, 1L, 3L, 3L, 3L, 4L), id_relative = c(1L,
2L, 1L, 2L, 3L, 1L), hours_care = c(3L, 6L, 5L, 0L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-6L))
senior <- data.frame(id_senior = 1:20)