如何根据条件和字符值合并行? (家庭数据)
How to merge rows based on conditions with characters values? (Household data)
我有一个数据框,其中第一列表示工作(经理、雇员或工人),第二列表示此人是否在夜间工作,最后一个是家庭代码(如果两个人共享相同的代码则表示他们共享同一所房子)。
#Here is the reproductible data :
PCS <- c("worker", "manager","employee","employee","worker","worker","manager","employee","manager","employee")
work_night <- c("Yes","Yes","No", "No","No","Yes","No","Yes","No","Yes")
HHnum <- c(1,1,2,2,3,3,4,4,5,5)
df <- data.frame(PCS,work_night,HHnum)
我的问题是我想要一个包含家庭而不是个人的新数据框。我想根据 HHnum 对个人进行分组,然后合并他们的答案。
对于变量“PCS”,我有基于答案组合的新类别:Manager+work =“I”; manager+employee="II", employee+employee=VI, worker+worker=III 等
对于变量“work_night”,我想应用一个分数(都回答是然后分数=2,如果一个回答是然后分数=1,如果两个都回答否则得分 = 0).
明确地说,我希望我的数据框看起来像这样:
HHnum PCS work_night
1 "I" 2
2 "VI" 0
3 "III" 1
4 "II" 1
5 "II" 1
我如何使用 dplyr 在 R 上执行此操作?我知道我需要 group_by() 但我不知道该用什么。
最好的,
维克多
这是一种方法(尽管我承认它很冗长)。我创建了一个参考数据框(即 combos
),以防你的类别多于 3 个,然后将其与主数据框(即 df_new
)结合以引入 PCS
罗马数字。
library(dplyr)
library(tidyr)
# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>%
as.data.frame() %>%
dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>%
dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)
# Get the count of "Yes" for each HHnum group.
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>%
dplyr::group_by(HHnum) %>%
dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
dplyr::rename("V1" = 3, "V2" = 4) %>%
dplyr::left_join(combos, by = c("V1", "V2")) %>%
unique() %>%
dplyr::select(HHnum, PCS, work_night)
我有一个数据框,其中第一列表示工作(经理、雇员或工人),第二列表示此人是否在夜间工作,最后一个是家庭代码(如果两个人共享相同的代码则表示他们共享同一所房子)。
#Here is the reproductible data :
PCS <- c("worker", "manager","employee","employee","worker","worker","manager","employee","manager","employee")
work_night <- c("Yes","Yes","No", "No","No","Yes","No","Yes","No","Yes")
HHnum <- c(1,1,2,2,3,3,4,4,5,5)
df <- data.frame(PCS,work_night,HHnum)
我的问题是我想要一个包含家庭而不是个人的新数据框。我想根据 HHnum 对个人进行分组,然后合并他们的答案。
对于变量“PCS”,我有基于答案组合的新类别:Manager+work =“I”; manager+employee="II", employee+employee=VI, worker+worker=III 等
对于变量“work_night”,我想应用一个分数(都回答是然后分数=2,如果一个回答是然后分数=1,如果两个都回答否则得分 = 0).
明确地说,我希望我的数据框看起来像这样:
HHnum PCS work_night
1 "I" 2
2 "VI" 0
3 "III" 1
4 "II" 1
5 "II" 1
我如何使用 dplyr 在 R 上执行此操作?我知道我需要 group_by() 但我不知道该用什么。
最好的, 维克多
这是一种方法(尽管我承认它很冗长)。我创建了一个参考数据框(即 combos
),以防你的类别多于 3 个,然后将其与主数据框(即 df_new
)结合以引入 PCS
罗马数字。
library(dplyr)
library(tidyr)
# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>%
as.data.frame() %>%
dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>%
dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)
# Get the count of "Yes" for each HHnum group.
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>%
dplyr::group_by(HHnum) %>%
dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
dplyr::rename("V1" = 3, "V2" = 4) %>%
dplyr::left_join(combos, by = c("V1", "V2")) %>%
unique() %>%
dplyr::select(HHnum, PCS, work_night)